polito.it
Politecnico di Torino (logo)

Fine-tuning Deep Language Models for Zero-Shot Text Classification

Elia Fontana

Fine-tuning Deep Language Models for Zero-Shot Text Classification.

Rel. Paolo Garza, Lorenzo Bongiovanni. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
Abstract:

High quality Zero-Shot Text Classification is one of the holy grails of NLP as it allows to avoid the difficult, time-consuming and expensive process of collecting and labelling data for supervised training. Deep language models have shown remarkable capabilities in various natural language processing tasks, but their effectiveness in Zero-Shot Text Classification remains an area of exploration. Surely, large language model (LLMs), e.g., GPT4 and LaMDA, have undoubtedly shown stunning generalization capabilities but they are not open-source and anyway intractable with normal computing resources. The aim of this thesis is to go deeper and analyze this task, in the context of tractable, open-source language models. In particular, we focus on MPNet, a language model pre-trained on extensive general corpora and specialized on the task of Semantic Text Similarity (STS). We explore the advantages of implementing a supervised contrastive learning objective during the fine-tuning phase to address the challenge of Zero-Shot Text Classification. The main focus of this work is centred on enhancing the model’s Zero-Shot capability by generating a better-suited vector-based representation for short sentences like noun phrases, used as labels. Given a document, such as scientific paper or journal article, consisting of a title and a description, noun phrases are extracted from title. The framework aim is to generate embeddings for this short-text keywords in such a way that they are as close as possible in the semantic vector space to the embedding of the associated long-text description. Furthermore, an analysis of alignment and distribution uniformity within these generated vectors is conducted to gain a deeper understanding of the semantic vector space generated by MPNet during fine-tuning. By shedding light on these aspects, this thesis contributes to a deeper understanding of Zero-Shot Text Classification and presents novel insights that may pave the way in enhancing the performance and capabilities of deep language models in the context of Zero-Shot Text Classification.

Relators: Paolo Garza, Lorenzo Bongiovanni
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 61
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: FONDAZIONE LINKS
URI: http://webthesis.biblio.polito.it/id/eprint/29355
Modify record (reserved for operators) Modify record (reserved for operators)