Davide Vitabile
Enabling Edge AI: Synthetic Data Generation and Supervised Fine-Tuning for Small Language Models.
Rel. Antonio Jose' Di Scala, Subash Subavignesh Nachimuthu. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
Abstract
The rapid development of Large Language Models (LLMs) has revolutionized natural language processing, enabling sophisticated human-computer interactions. However, the widespread deployment of these models in cloud environments raises significant concerns about privacy, data security and computational efficiency. This thesis, conducted at Tether's Data Division, focuses on the development of privacy-preserving AI solutions through the creation of efficient Small Language Models (SLMs) with 1-3 billion parameters, optimized for local deployment on edge devices. This research provides two main contributions to the development of high performance SLMs. Firstly, it introduces novel pipelines for high-quality post-training synthetic data generation, which is essential for improving model performance in following instructions while reducing hallucinations.
This approach includes an instruction tuning pipeline featuring generation, diversity, and validation stages, as well as a specialized zero-shot chain-of-thought pipeline to improve reasoning capabilities
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Informazioni aggiuntive
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
