Gabriele Tomatis
Can you hear what I’ve learned? Explaining audio transformer-based models through embedding sonification.
Rel. Eliana Pastor, Alkis Koudounas. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (8MB) | Preview |
Abstract
Since their introduction, transformer models showed instantly their high performances in the analysis of structured data such as images, time series and audios. Their ability in solving the most different tasks brought them to become rapidly the state of the art in a wide variety of domains. How they reason, however, is still a big issue, as they translate those data into an embedding representation that only they could comprehend. Despite that, only a few works try to solve this problem by using several methods proposed in the field of Explainable AI. The aim of this discipline is to make AI models interpretable in a way that makes them trustworthy and reliable; this would be impossible to obtain if we do not understand the way the models reason.
To address these issues, we take advantage of Descript Audio VAE, a model specifically trained to compress and reconstruct an audio waveform passing through a latent space representation
Tipo di pubblicazione
URI
![]() |
Modifica (riservato agli operatori) |
