polito.it
Politecnico di Torino (logo)

Computational Biases of Foundation Models for Speech Emotion Recognition: A Quantitative Analysis

Elena Di Felice

Computational Biases of Foundation Models for Speech Emotion Recognition: A Quantitative Analysis.

Rel. Giuseppe Rizzo, Federico D'Asaro. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (17MB) | Preview
Abstract:

As Artificial Intelligent systems become more widely used in our daily lives, it is crucial to ensure not only their accuracy, but also their fairness. In this study, I focused on assessing fairness and the possible presence of bias in systems that address the task of Speech Emotion Recognition (SER). Speech Emotion Recognition is the process of automatically detecting and understanding the emotional content conveyed through spoken language. It relies on analyzing acoustic features of the speech signal, independently of the actual linguistic content. The experiments were conducted using the only two datasets available in Italian for this task, Emozionalmente and EMOVO. I implemented the fairness metrics that are mostly used in literature (Disparate Impact, Statistical Parity, Average Odds and Equal opportunity) as well as two baselines to run the tests: a Support Vector Machine (SVM) model, considering two different methods to extract features (MFCC and MFMC), and a ResNet. Two sensitive attributes were considered for the analysis, based on the information about the subjects made available by the datasets: in the experiments carried out using EMOVO, only gender was considered, while in those using Emozionalmente I was also able to consider age. I then tested the fairness, using the same metrics, of WavLM, a new transformer based pre-trained model. By comparing the results obtained, I was able to verify how different algorithms use the intrinsic information contained in the audios to obtain the labels, and by changing the distributions of subjects in the training datasets, I was able to verify whether and how the training data affect the output in terms of bias. Furthermore, by performing the experiments on a model that has better accuracy performance than the baselines, I was also able to draw conclusions about the dependence between bias and accuracy.

Relators: Giuseppe Rizzo, Federico D'Asaro
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 99
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: FONDAZIONE LINKS
URI: http://webthesis.biblio.polito.it/id/eprint/31441
Modify record (reserved for operators) Modify record (reserved for operators)