Ludovica Mazzucco
Transformer-based speech and text recognition models in the context of the Next-Generation Aircraft's Virtual Assistant.
Rel. Luigi De Russis. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024
Abstract
The aim of this thesis is to present possible implementations of the two Machine Learning tasks of Speech To Text and Text To Intent paired with Named Entity Recognition, in the context of a possible deployment as core technologies exploited by a Virtual Assistant running on board of the next generation Fighter. Indeed, it is the outcome of a six-month long period of Internship in Leonardo Labs, the R&D department of Leonardo SpA, based in Turin. In what the Speech To Text module is concerned, OpenAI Whisper neural network is exploited as base structure to be fine-tuned on the down-stream task, with a dataset generated by collecting audio recordings through a Graphic User Interface implemented through the Python framework Streamlit.
Comparisons have been performed between the behavior on the test set of the pre-trained model, the one fine-tuned on clean dataset and lastly the model fine-tuned on the dataset with audio effects applied, thus it has been demonstrated that the benefit produced by fine-tuning is represented by the sharp reduction of error percentage, moreover, in some versions of the pre-trained Whisper data augmentation further improves results
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Informazioni aggiuntive
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
