Voice Classification in Parkinson’s Disease Using Transformer Models and Error Rate Metrics

Benedetta Perrone

Voice Classification in Parkinson’s Disease Using Transformer Models and Error Rate Metrics.

Rel. Gabriella Olmo, Federica Amato. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Abstract:	Parkinson's disease (PD) is a neurodegenerative disorder resulting from the progressive degeneration of dopaminergic neurons in the substantia nigra pars compacta. Alongside motor symptoms like bradykinesia, tremor, and rigidity, PD is also associated with non-motor impairments, including cognitive decline, depression, sleep disorders, and autonomic dysfunctions. One of the most prevalent non-motor symptoms is the alteration of voice and speech, affecting up to 90% of PD patients. The progressive decline in vocal function can lead to hypokinetic dysarthria, reducing speech intelligibility, volume, and prosody, with a significant impact on patients’ quality of life. This study has two main objectives: (1) to distinguish between healthy individuals and those with Parkinson's disease based on vocal characteristics, and (2) to assess disease severity using Word Error Rate (WER) and Character Error Rate (CER), exploring their correlation with Unified Parkinson's Disease Rating Scale (UPDRS) scores. The models used include the Vision Transformer (ViT) and the Audio Spectrogram Transformer (AST), trained on vocal recordings from datasets comprising both PD patients and healthy controls. The preprocessing pipeline included resampling to 16 kHz, volume normalization, and outlier removal. Mel-spectrograms were generated for ViT, while AST directly processed the waveform. Both models were trained using 5-fold cross-validation to ensure robustness, and their performance was evaluated in terms of accuracy, precision, recall, and F1-score. WER and CER metrics were calculated using OpenAI's Whisper model and compared between patients and healthy controls. Statistical analysis, including Shapiro-Wilk and Mann-Whitney U tests, revealed significant differences in WER and CER between patients and controls, indicating a correlation between increased vocal production errors and disease severity (UPDRS). The results suggest that integrating deep learning methodologies in clinical settings could offer promising, non-invasive tools for early diagnosis and continuous monitoring of PD, such as through mobile voice recordings for remote, non-invasive monitoring. Additionally, explainability techniques could generate heatmaps highlighting critical areas of the mel-spectrogram, enabling the identification and restoration of unintelligible speech elements directly from spectrograms. This approach could support the development of personalized text-to-speech systems to aid communication for patients with severe vocal impairments.
Relatori:	Gabriella Olmo, Federica Amato
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	96
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/33679

Modifica (riservato agli operatori)