Vittorio Pellegrini
Self-Supervised Fine-Tuning of sentence embedding models using a Smooth Inverse Frequency model.
Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023
Abstract
Sentence embedding models play a key role in the field of Natural Language Processing. They can be exploited for the resolution of several tasks like sentence paraphrasing, sentence similarity, and sentence clustering. Fine- tuning pre-trained models for sentence embedding extraction is a common practice that allows it to reach state-of-the-art performance on downstream tasks. Nevertheless, this practice usually requires labeled data sets. This thesis project aims to overcome this issue by introducing a novel technique for the automatic creation of a target set for fine-tuning sentence embedding models for a specific downstream task. The technique is evaluated on three distinct tasks: sentence paraphrasing, sentence similarity, and sentence clustering.
The results demonstrate a significant improvement in sentence embedding models when employing the Smooth Inverse Frequency technique for automatic extraction and labeling of sentence pairs
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Informazioni aggiuntive
Corso di laurea
Classe di laurea
Ente in cotutela
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
