Automatic uncertainty-aware calibration for improved AI generalization in multi-center brain tumor MRI classification

Federica Apolloni

Automatic uncertainty-aware calibration for improved AI generalization in multi-center brain tumor MRI classification.

Rel. Massimo Salvi, Silvia Seoni. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2025

PDF (Tesi_di_laurea) - Tesi
Accesso riservato a: Solo utenti staff fino al 17 Aprile 2027 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (22MB)

Abstract:	Artificial intelligence (AI) in brain tumor classification represents a potentially powerful support for clinicians. It can process large image volume efficiently, detect subtle patterns often imperceptible to the human eye, and provide objective predictions. This capability accelerates and refines the diagnostic process and also reinforces early detection, which is essential for defining the most appropriate treatment plan and improving patient outcomes. In clinical contexts, high accuracy alone is not enough; models must also provide reliable predictions. A critical limitation is overconfidence: deep neural networks give highly certain answers even when wrong, without recognizing the limits of their own knowledge. This results from miscalibration, meaning that model confidence does not reflect actual correctness. AI models often perform well on internal datasets, similar to those used for training, but may provide unreliable predictions on out-of-distribution data, further motivating the need for proper calibration. It is therefore essential to integrate uncertainty quantification (UQ) with reject option mechanisms, retaining only the most reliable cases while highlighting those at higher risk of error. This work introduces a modular and reusable pipeline for UQ applied to brain tumor classification. Deep learning training is performed on brain MRI dataset including three tumor types and healthy controls. The method is based on Monte Carlo Dropout (MCD), which enables dropout at inference to generate multiple predictions for each input, from which uncertainty metrics are derived. The pipeline systematically explores alternative dropout configurations and uncertainty metrics, and automatically selects the setting that most clearly discriminates between correctly classified and misclassified samples. On this basis, an optimal threshold is identified as a trade-off between separation accuracy and adequate proportion of cases that can still be confidently addressed by the model. A key aspect of this study is that the threshold, estimated on the validation set, is tested on completely external datasets, ensuring that robustness and generalization are assessed beyond the data seen during training. This choice reflects realistic clinical conditions, where a model must perform reliably on unseen data rather than on familiar distributions. The pipeline integrates performance metrics and graphical analyses, enabling the evaluation not only of classification accuracy but also of the quality of predictive confidence. Results demonstrate a clear improvement in calibration and in performance on accepted cases, without significantly reducing coverage. Overall, this work provides a structured and automatic approach to AI model calibration through MCD, showing substantial improvements that generalize to external test datasets, and supporting the adoption of more reliable neural networks in medical imaging.
Relatori:	Massimo Salvi, Silvia Seoni
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	84
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/37393

Modifica (riservato agli operatori)