Speech-Text Cross-Modal Learning through Self-Attention Mechanisms

Damiano Bonaccorsi

Speech-Text Cross-Modal Learning through Self-Attention Mechanisms.

Rel. Eliana Pastor, Alkis Koudounas, Moreno La Quatra. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (8MB) | Preview

Abstract

Speech, with its various elements like intonation and non-verbal vocalisations, is considered to be the earliest form of human language. However, existing systems for understanding spoken language mostly focus on the textual aspect, disregarding these additional components. Recent advancements in speech language modelling have enabled the development of speech-based language models called SpeechLMs. Nevertheless, text remains the primary mode of communication on the internet. Given this pretext, the objective of the thesis is to analyse the current state-of-the-art speech models and design a novel approach to combine the speech and text modalities, obtaining an architecture that is capable of leveraging the advantages of both.

To do so, we adapt VisualBERT’s approach—a previous work that introduces a simple and flexible framework to model a vast range of vision-and-text tasks—for the speech and text modalities

Relatori

Eliana Pastor, Alkis Koudounas, Moreno La Quatra

Anno Accademico

2023/24

Tipo di pubblicazione

Elettronica

Numero di pagine

Corso di laurea

Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)

Classe di laurea

Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA

URI

https://webthesis.biblio.polito.it/id/eprint/29585

Modifica (riservato agli operatori)