Enrico Capuano
Enhancing Embedding Models through Specialized Finetuning in the banking sector.
Rel. Daniele Apiletti, Claudia Berloco. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
Abstract: |
The rapid advancement of Generative AI and Natural Language Processing has led to the widespread adoption of embedding models in various applications, including question-answering systems. These systems rely on the representation of words and sentences through embeddings to retrieve relevant information. However, open source pre-trained multipurpose embedding models may not capture specific nuances in certain contexts, such as the banking sector. This study investigates the benefits of fine-tuning pre-trained embedding models on a dedicated dataset to improve their performance in specific contexts. In details, we build a proprietary dataset from the banking sector using proprietary documents. We split the dataset into train and test sets. The first is used to test different pretrained open-source multipurpose embedding models and the second to get a fine-tuning. The performance is also evaluated in a retrieval augmented generation (RAG) pipeline. The results are compared to those of the original multipurpose model to assess the impact of fine-tuning on sentence comprehension and retrieval. By analyzing the performance of fine-tuned models, we can better understand how to tailor embedding models to meet the unique needs of various industries and applications. |
---|---|
Relatori: | Daniele Apiletti, Claudia Berloco |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 63 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | INTESA SANPAOLO SpA |
URI: | http://webthesis.biblio.polito.it/id/eprint/33778 |
Modifica (riservato agli operatori) |