Martina Andrulli
Machine learning modelling for mortality prediction in a population of older adults.
Rel. Monica Visintin, Brendan O'Flynn, Salvatore Tedesco. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2020
Abstract: |
Nowadays, the rapid ageing of the population has driven researchers’ and clinicians’ interest with the final goal of extending the duration of human life-cycle while, at the same time, minimize the overall healthcare cost. Moreover, the increased use of healthcare resources by the older population is currently putting significant pressure on the healthcare systems worldwide. For this reason, the efficient assessment of patients’ current health status and, even more, the implementation of prognostic modelling for the development of mortality risk prediction algorithms, is of primary importance since being able to make an accurate prediction about a person’s mortality risk can improve the long-term survival and reduce the healthcare economic burden. The rationale behind this study was to demonstrate the value that machine learning, through accurate and objective predictions, could offer in this scenario. In particular, the main objective was to use clinical data to develop machine learning algorithms for all-cause mortality prediction in a cohort of healthy older adults (≥ 70 years). Moreover, the hypothesis investigated in this study was that all-cause mortality could be predicted based mostly on data collected via activity trackers and questionnaires, thus significantly reducing the expense burden on the healthcare system because of the low-cost and ease-of-use of these tools. The data adopted in this study were taken from the Healthy Ageing Initiative study in Umea, Sweden (2291 participants), collected over a 5 years’ span (from January 2013 to December 2017) and including data on medical history, lab tests, physical activity and behaviour indicators, such as alcohol and tobacco use and mental well-being. The challenging aspect of this work is related to the complexities arising from the severely imbalanced dataset, since only 4% of cases (92 participants) belonged to the positive class, representing patients which passed away in the time between their data collection and the end of study date (31st December 2019). A series of machine learning techniques were adopted in order to overcome the problem, involving data augmentation, feature engineering, oversampling/under- sampling, probability calibration, and ensembles methodologies by means of several base classifiers, such as Logistic Regression, Decision Tree, Support Vector Machine, etc. The resulting model involves a feature selection phase (applied through the Feature Selection Component Analysis - FSCA algorithm), followed by an outliers removal step performed by means of the Isolation Forest technique. Subsequently, the developed model relies on an ensemble with AdaBoost as base classifiers, where each base classifier is properly tuned via a grid search and trained on a chunk of the training data on which the Random Balance oversampling technique is applied (for balancing both classes in the chunk). The results of the base classifiers are finally taken into account based on a soft-voting strategy. The final results are promising showing an area under the curve (AUC) ≥ 0.79, which is aligned to the state-of-the-art results in literature on the topic. Further analysis are required to test the generalisability of the developed model to other older populations globally, as well as the investigation on increasing the transparency and interpretability of the model by the end-users without impacting the overall performance. |
---|---|
Relatori: | Monica Visintin, Brendan O'Flynn, Salvatore Tedesco |
Anno accademico: | 2020/21 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 127 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI |
Ente in cotutela: | UNIVERSIDAD POLITECNICA DE CATALUNYA - ETSET BARCELONA (SPAGNA) |
Aziende collaboratrici: | Tyndall Nationall Institute |
URI: | http://webthesis.biblio.polito.it/id/eprint/16640 |
Modifica (riservato agli operatori) |