polito.it
Politecnico di Torino (logo)

Automated Deep Learning of Primary Progressive Aphasia (PPA) variants from cognitive-test voice recordings and T1-weighted MRI data

Alice Corini

Automated Deep Learning of Primary Progressive Aphasia (PPA) variants from cognitive-test voice recordings and T1-weighted MRI data.

Rel. Filippo Molinari, Massimo Salvi, Massimo Filippi, Federica Agosta, Silvia Basaia. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview
Abstract:

Primary Progressive Aphasia (PPA) is a neurodegenerative disorder characterized by a gradual decline in language abilities, caused by localized atrophy in specific cortical regions. It manifests in three variants: non-fluent (nfvPPA), semantic (svPPA), and logopenic (lvPPA), each associated with distinct linguistic profiles and atrophy patterns. Standardized language batteries, together with structural Magnetic Resonance Imaging (MRI) quantification of atrophy, remain mainstays of clinical diagnosis. This thesis developed deep learning models for the automatic classification of PPA variants by integrating two clinical modalities: voice recordings collected during clinician-administered cognitive tests and volumetric three-dimensional (3D) T1-weighted structural MRI. For the audio stream, multiple network architectures were designed and compared to identify the most effective model for PPA discrimination. Moreover, a three-dimensional Convolutional Neural Network (3D-CNN) was trained on 3D T1-weighted MRI images. Finally, a late-fusion approach combined the predictions from the audio and MRI models to yield more robust overall performance. Data were collected at the IRCCS San Raffaele Scientific Institute and included 94 PPA patients (38 nfvPPA, 36 svPPA, 20 lvPPA) and 91 healthy controls (HC). Of these, 81 PPA and 38 HC completed the Picnic picture description task from the Western Aphasia Battery (WAB), designed to evaluate connected speech production. Moreover, 90 PPA (38 nfvPPA, 33 svPPA, 19 lvPPA) and 82 HC underwent structural MRI. Audio signals underwent preprocessing for noise removal, normalization, and segmentation. The resulting fragments were converted into log-Mel spectrograms and used to train a two-dimensional Convolutional Neural Network (2D-CNN). In a second approach, a 2D-CNN combined with a Bidirectional Long Short-Term Memory (BiLSTM) network was implemented to capture both spectral and temporal dependencies of speech. Finally, a Wav2Vec 2.0 model pre-trained on Italian speech was fine-tuned using raw one-dimensional audio as input, with additional classification layers trained on the PPA dataset. MRI images were preprocessed through brain extraction, normalization to the Montreal Neurological Institute (MNI) stereotaxic space, and cropping to the regions most affected by atrophy across PPA variants and then used to train a 3D-CNN. A fusion strategy was subsequently applied to combine the predictions of the audio and MRI networks. All architectures were evaluated in both multiclass and binary classification settings. The 2D-CNN with BiLSTM achieved the best performance among the audio-based models in the multiclass task, and good performance in the binary comparisons. The study demonstrates that deep learning architectures can effectively be used for the automatic classification of PPA.

Relatori: Filippo Molinari, Massimo Salvi, Massimo Filippi, Federica Agosta, Silvia Basaia
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 159
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici: Ospedale San Raffaele S.r.l.
URI: http://webthesis.biblio.polito.it/id/eprint/38351
Modifica (riservato agli operatori) Modifica (riservato agli operatori)