Damiano Bonaccorsi
Speech-Text Cross-Modal Learning through Self-Attention Mechanisms.
Rel. Eliana Pastor, Alkis Koudounas, Moreno La Quatra. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (8MB) | Preview |
Abstract
Speech, with its various elements like intonation and non-verbal vocalisations, is considered to be the earliest form of human language. However, existing systems for understanding spoken language mostly focus on the textual aspect, disregarding these additional components. Recent advancements in speech language modelling have enabled the development of speech-based language models called SpeechLMs. Nevertheless, text remains the primary mode of communication on the internet. Given this pretext, the objective of the thesis is to analyse the current state-of-the-art speech models and design a novel approach to combine the speech and text modalities, obtaining an architecture that is capable of leveraging the advantages of both.
To do so, we adapt VisualBERT’s approach—a previous work that introduces a simple and flexible framework to model a vast range of vision-and-text tasks—for the speech and text modalities
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
URI
![]() |
Modifica (riservato agli operatori) |
