Enrico Capuano
Enhancing Embedding Models through Specialized Finetuning in the banking sector.
Rel. Daniele Apiletti, Claudia Berloco. Politecnico di Torino, Master of science program in Data Science And Engineering, 2024
| Abstract: |
The rapid advancement of Generative AI and Natural Language Processing has led to the widespread adoption of embedding models in various applications, including question-answering systems. These systems rely on the representation of words and sentences through embeddings to retrieve relevant information. However, open source pre-trained multipurpose embedding models may not capture specific nuances in certain contexts, such as the banking sector. This study investigates the benefits of fine-tuning pre-trained embedding models on a dedicated dataset to improve their performance in specific contexts. In details, we build a proprietary dataset from the banking sector using proprietary documents. We split the dataset into train and test sets. The first is used to test different pretrained open-source multipurpose embedding models and the second to get a fine-tuning. The performance is also evaluated in a retrieval augmented generation (RAG) pipeline. The results are compared to those of the original multipurpose model to assess the impact of fine-tuning on sentence comprehension and retrieval. By analyzing the performance of fine-tuned models, we can better understand how to tailor embedding models to meet the unique needs of various industries and applications. |
|---|---|
| Relators: | Daniele Apiletti, Claudia Berloco |
| Academic year: | 2024/25 |
| Publication type: | Electronic |
| Number of Pages: | 63 |
| Additional Information: | Tesi secretata. Fulltext non presente |
| Subjects: | |
| Corso di laurea: | Master of science program in Data Science And Engineering |
| Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
| Aziende collaboratrici: | INTESA SANPAOLO SpA |
| URI: | http://webthesis.biblio.polito.it/id/eprint/33778 |
![]() |
Modify record (reserved for operators) |



Licenza Creative Commons - Attribuzione 3.0 Italia