Utku Kepir
Application of Approximate Computing Techniques in Large Language Models.
Rel. Alessandro Savino, Stefano Di Carlo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (818kB) | Preview |
|
|
Archive (ZIP) (Documenti_allegati)
- Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) |
Abstract
Large Language Models (LLMs) have recently achieved state-of-the-art performance in a wide range of natural language processing tasks, but their rapid growth in size has introduced severe challenges in terms of computational cost, memory consumption, and energy efficiency. This makes their deployment on resource- constrained environments increasingly difficult, and has motivated research into approximation strategies that trade exactness for efficiency. The first half of this thesis presents an extensive survey of approximate comput- ing methods for transformer-based architectures, focusing on techniques such as quantization, pruning, low-rank approximation (LoRA), stochastic perturbations, and stochastic memory masking. Alongside the survey, a benchmarking framework was developed to evaluate these approaches in a consistent and comparable man- ner.
The framework integrates support for multiple datasets, including Alpaca, Databricks-Dolly-15k, and AgentInstruct, and provides metrics such as BLEU score, ROUGE-L score, F1 score, inference time, output size, and perplexity
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
URI
![]() |
Modifica (riservato agli operatori) |
