polito.it
Politecnico di Torino (logo)

Can you hear what I’ve learned? Explaining audio transformer-based models through embedding sonification

Gabriele Tomatis

Can you hear what I’ve learned? Explaining audio transformer-based models through embedding sonification.

Rel. Eliana Pastor, Alkis Koudounas. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (8MB) | Preview
Abstract:

Since their introduction, transformer models showed instantly their high performances in the analysis of structured data such as images, time series and audios. Their ability in solving the most different tasks brought them to become rapidly the state of the art in a wide variety of domains. How they reason, however, is still a big issue, as they translate those data into an embedding representation that only they could comprehend. Despite that, only a few works try to solve this problem by using several methods proposed in the field of Explainable AI. The aim of this discipline is to make AI models interpretable in a way that makes them trustworthy and reliable; this would be impossible to obtain if we do not understand the way the models reason. To address these issues, we take advantage of Descript Audio VAE, a model specifically trained to compress and reconstruct an audio waveform passing through a latent space representation. In particular, we apply a gating layer between the embedding space of the model that we want to interpret and the latent space of Descript Audio VAE. This mapping converts the embeddings of the first audio transformer to the latents of the second model that can translate this unknown representation into sound: something that we can interpret.

Relatori: Eliana Pastor, Alkis Koudounas
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 73
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/35389
Modifica (riservato agli operatori) Modifica (riservato agli operatori)