Explainable AI for Speech Data: From words to phoneme

Raul Gatto

Explainable AI for Speech Data: From words to phoneme.

Rel. Eliana Pastor, Alkis Koudounas. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Abstract

As speech-based technologies such as virtual assistants become increasingly present in our daily lives, it also increases the need for transparency and interpretability in these systems due to concerns regarding transparency of their decision-making processes. Deep learning has significantly enhanced the performance of Automatic Speech Recognition (ASR), but also turned them into black-boxes, increasing their complexity and opacity. This thesis addresses these challenges by applying Explainable AI (XAI) techniques in the context of ASR systems, aiming to move to a new granularity, shifting from the current word-level explanations to phoneme-level explanations, trying to find the contributions of sub-word units, which has been unexplored until now.

To achieve this, the work adapts already existing model-agnostic explanation methods such as Leave-One-Out (LOO), LIME, and SHAP, traditionally used for image and text classification explanations, to be able to perform perturbations at phoneme level

Tipo di pubblicazione

Elettronica

URI

https://webthesis.biblio.polito.it/id/eprint/36342

Modifica (riservato agli operatori)