Computational Biases of Foundation Models for Speech Emotion Recognition: A Quantitative Analysis

Elena Di Felice

Computational Biases of Foundation Models for Speech Emotion Recognition: A Quantitative Analysis.

Rel. Giuseppe Rizzo, Federico D'Asaro. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (17MB) | Preview

Abstract

As Artificial Intelligent systems become more widely used in our daily lives, it is crucial to ensure not only their accuracy, but also their fairness. In this study, I focused on assessing fairness and the possible presence of bias in systems that address the task of Speech Emotion Recognition (SER). Speech Emotion Recognition is the process of automatically detecting and understanding the emotional content conveyed through spoken language. It relies on analyzing acoustic features of the speech signal, independently of the actual linguistic content. The experiments were conducted using the only two datasets available in Italian for this task, Emozionalmente and EMOVO.

I implemented the fairness metrics that are mostly used in literature (Disparate Impact, Statistical Parity, Average Odds and Equal opportunity) as well as two baselines to run the tests: a Support Vector Machine (SVM) model, considering two different methods to extract features (MFCC and MFMC), and a ResNet