polito.it
Politecnico di Torino (logo)

Data Science for predicting SARS-CoV-2 mortality

Maria Francesca Turco

Data Science for predicting SARS-CoV-2 mortality.

Rel. Roberto Fontana. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (10MB) | Preview
Abstract:

It has been a little over two years since all of our lives were completely changed after a new, as yet unidentified strain of coronavirus spread to all parts of the world. With the diffusion of the SARS-CoV-2 pandemic, the scientific community took immediate action to first sequence the virus and find drugs that could adequately treat those affected, and then switch to vaccines to prevent the spread of the disease. Data Science, which has proven to be very reliable in the medical field, played its role in the fight against this pandemic. Using Data Science to predict the probability of death offers a great opportunity to optimize the allocation of medical resources, which is crucial in responding to a large-scale outbreak of an emerging infectious disease such as COVID-19. The main goal of this work was therefore to develop a Machine Learning model that can identify whether a patient with SARS-CoV-2 is at risk of death. For this purpose, the CDC COVID-19 Case Surveillance dataset was used. This is a very large American private dataset from which ten thousands of COVID-19 patients were extracted to start the research activity. Once the first outcomes were available, we moved on to the second phase, which took into account a total of more than 3 million patients, in order to further expand the analysis and assess the reliability of the initial results. Based on the literature review, the most commonly used and effective algorithms for predicting mortality in people affected by SARS-CoV-2 were selected. These include Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Artificial Neural Network and Bayesian Network. They were then all subjected to a performance evaluation in order to determine which produced the best results. After carrying out several experiments also the most alarming symptoms and patient characteristics that need closer monitoring were detected. The model which has shown the highest accuracy that is 97.20±0.8 was the Random Forest classifier. Forecasting a patient’s mortality with Machine Learning could definitely help the healthcare system in all countries of the world to give more attention and medical care to people who are most at risk. In this way, they will receive appropriate treatments in a shorter time and there is hope in this manner to reduce the overall mortality rate in the world.

Relatori: Roberto Fontana
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 122
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/24603
Modifica (riservato agli operatori) Modifica (riservato agli operatori)