polito.it
Politecnico di Torino (logo)

Interpretable acoustic features for depression detection: a comparative study of healthy & Parkinson’s disease individuals

Barbara Ruvolo

Interpretable acoustic features for depression detection: a comparative study of healthy & Parkinson’s disease individuals.

Rel. Antonio Servetti, Mathew Magimai Doss. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Del Cinema E Dei Mezzi Di Comunicazione, 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

Depression is a psychiatric mood disorder that significantly affects an individual's emotional state and functional abilities. It is often associated with other medical conditions, and this is particularly relevant in the context of Parkinson's disease, where non-motor symptoms, including depression, pose substantial challenges to the well-being of the individuals. Detecting depression through speech analysis has gained prominence due to the perceptible alterations in speech patterns influenced by emotional and cognitive changes. However, the accurate extraction and interpretation of these acoustic features remains challenging, especially in speech difficulties such as those with Parkinson's disease. This thesis seeks to address this challenge by employing various machine learning techniques to identify depression through speech analysis and to make the most indicative acoustic features for depression interpretable from speech signals of healthy individuals and patients affected by Parkinson's. Two distinct methodologies are explored: a traditional handcrafted feature approach and an end-to-end (E2E) approach utilizing Convolutional Neural Networks (CNNs). The handcrafted approach involves extracting eGeMAPS and ComPARE feature sets from speech, normalizing these features and subsequently employing Support Vector Machine, Random Forest, and Gradient Boosting algorithms for classification. Then, from the best model, the most relevant features for the task are qualitatively interpreted. Utilizing the E2E approach and leveraging the capabilities of CNNs to learn pertinent information from input signals, we explore the modelling of three types of signals—original raw speech signals, Zero frequency filtered signals, and Composite signals—to understand voice source-related information for detecting depression. Insights from this research could contribute to the development of more accurate and efficient systems for detecting depression or other mental disorders through non-intrusive methods, such as speech analysis.

Relatori: Antonio Servetti, Mathew Magimai Doss
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 75
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Del Cinema E Dei Mezzi Di Comunicazione
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: Idiap Research Institute (SVIZZERA)
Aziende collaboratrici: Idiap Research Institute
URI: http://webthesis.biblio.polito.it/id/eprint/29509
Modifica (riservato agli operatori) Modifica (riservato agli operatori)