polito.it
Politecnico di Torino (logo)

Features selection for SARS-CoV-2 spread in Italy: observation at regional and provincial level

Giulia Ciaramella

Features selection for SARS-CoV-2 spread in Italy: observation at regional and provincial level.

Rel. Monica Visintin, Guido Pagana. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
[img] Archive (ZIP) (Documenti_allegati) - Other
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (427kB)
Abstract:

At the end of 2019 a novel virus -later called SARS-CoV-19- began circulating in the Chinese area of Wuhan, causing a few months later the COVID-19 pandemic. The new disease, which causes severe pneumonia, stirred up problems to the health system of many countries, with hospitals overcrowded with people not receiving the necessary care due to the limited number of respiratory machines. Having spared in the entire globe, the disease captured the attention of all sort of scientists. In the domain of machine learning, researchers applied their knowledge in imaging diagnosis, in therapeutic goal (to study possible usable drugs among some already available on the market), and also in forecasting methods, to know in advance the number of future diseased people. What is missing in the literature is a study about the possible causes of the virus spread. In fact, since the beginning of the epidemic, the virus has spread severely in some countries and less harshly in others. Moreover, in some territories such as Italy, the spread of the virus among the regions and provinces is different and the reason is unknown. The aim of this study hence is: taking Italy as a case study, considering first provinces and then regions and finding the features, among some selected ones (territorial, meteorological, demographic, etc.), that mostly affected the virus spread in the territory. To cope with the problem, several regression methods were used: linear, ensemble trees and Gaussian process regressions. All three machine learning methods output not only the regressed value but also the feature relevance: in this way feature selection was possible. The regressions are repeated with different target variables: the cumulative number of diseased and the variation of diseased for both provinces and regions, and the number of deaths for only the regions. The period of study was 26th February - 17th April for regions and 16th February - 11st April for provinces, due to the limited availability of some data. Regression methods have been compared at the end of each regression using some performance indicators (r2, RMSE and error variance) so that it was possible to conclude which method was better for the specific dataset and target. Finally, a conclusion about the features that appear to be the most related to virus spreading in Italy is given.

Relators: Monica Visintin, Guido Pagana
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 64
Subjects:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: New organization > Master science > LM-27 - TELECOMMUNICATIONS ENGINEERING
Aziende collaboratrici: FONDAZIONE LINKS
URI: http://webthesis.biblio.polito.it/id/eprint/15978
Modify record (reserved for operators) Modify record (reserved for operators)