polito.it
Politecnico di Torino (logo)

Enhancing Embedding Models through Specialized Finetuning in the banking sector

Enrico Capuano

Enhancing Embedding Models through Specialized Finetuning in the banking sector.

Rel. Daniele Apiletti, Claudia Berloco. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Abstract:

The rapid advancement of Generative AI and Natural Language Processing has led to the widespread adoption of embedding models in various applications, including question-answering systems. These systems rely on the representation of words and sentences through embeddings to retrieve relevant information. However, open source pre-trained multipurpose embedding models may not capture specific nuances in certain contexts, such as the banking sector. This study investigates the benefits of fine-tuning pre-trained embedding models on a dedicated dataset to improve their performance in specific contexts. In details, we build a proprietary dataset from the banking sector using proprietary documents. We split the dataset into train and test sets. The first is used to test different pretrained open-source multipurpose embedding models and the second to get a fine-tuning. The performance is also evaluated in a retrieval augmented generation (RAG) pipeline. The results are compared to those of the original multipurpose model to assess the impact of fine-tuning on sentence comprehension and retrieval. By analyzing the performance of fine-tuned models, we can better understand how to tailor embedding models to meet the unique needs of various industries and applications.

Relatori: Daniele Apiletti, Claudia Berloco
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 63
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: INTESA SANPAOLO SpA
URI: http://webthesis.biblio.polito.it/id/eprint/33778
Modifica (riservato agli operatori) Modifica (riservato agli operatori)