Enrico Capuano
Enhancing Embedding Models through Specialized Finetuning in the banking sector.
Rel. Daniele Apiletti, Claudia Berloco. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
| Abstract: |
The rapid advancement of Generative AI and Natural Language Processing has led to the widespread adoption of embedding models in various applications, including question-answering systems. These systems rely on the representation of words and sentences through embeddings to retrieve relevant information. However, open source pre-trained multipurpose embedding models may not capture specific nuances in certain contexts, such as the banking sector. This study investigates the benefits of fine-tuning pre-trained embedding models on a dedicated dataset to improve their performance in specific contexts. In details, we build a proprietary dataset from the banking sector using proprietary documents. We split the dataset into train and test sets. The first is used to test different pretrained open-source multipurpose embedding models and the second to get a fine-tuning. The performance is also evaluated in a retrieval augmented generation (RAG) pipeline. The results are compared to those of the original multipurpose model to assess the impact of fine-tuning on sentence comprehension and retrieval. By analyzing the performance of fine-tuned models, we can better understand how to tailor embedding models to meet the unique needs of various industries and applications. |
|---|---|
| Relatori: | Daniele Apiletti, Claudia Berloco |
| Anno accademico: | 2024/25 |
| Tipo di pubblicazione: | Elettronica |
| Numero di pagine: | 63 |
| Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
| Soggetti: | |
| Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
| Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
| Aziende collaboratrici: | INTESA SANPAOLO SpA |
| URI: | http://webthesis.biblio.polito.it/id/eprint/33778 |
![]() |
Modifica (riservato agli operatori) |



Licenza Creative Commons - Attribuzione 3.0 Italia