Application of Approximate Computing Techniques in Large Language Models

Utku Kepir

Application of Approximate Computing Techniques in Large Language Models.

Rel. Alessandro Savino, Stefano Di Carlo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview	PDF (Tesi_di_laurea) - Tesi Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (818kB) \| Preview
	Archive (ZIP) (Documenti_allegati) - Altro Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (1MB)

Abstract

Large Language Models (LLMs) have recently achieved state-of-the-art performance in a wide range of natural language processing tasks, but their rapid growth in size has introduced severe challenges in terms of computational cost, memory consumption, and energy efficiency. This makes their deployment on resource- constrained environments increasingly difficult, and has motivated research into approximation strategies that trade exactness for efficiency. The first half of this thesis presents an extensive survey of approximate comput- ing methods for transformer-based architectures, focusing on techniques such as quantization, pruning, low-rank approximation (LoRA), stochastic perturbations, and stochastic memory masking. Alongside the survey, a benchmarking framework was developed to evaluate these approaches in a consistent and comparable man- ner.

The framework integrates support for multiple datasets, including Alpaca, Databricks-Dolly-15k, and AgentInstruct, and provides metrics such as BLEU score, ROUGE-L score, F1 score, inference time, output size, and perplexity