Politecnico di Torino (logo)

Extraction and identification of information from Mass Spectra of the breath of patients infected with SARS-CoV-2

Abdul Hadi Saeed

Extraction and identification of information from Mass Spectra of the breath of patients infected with SARS-CoV-2.

Rel. Giovanni Squillero, Riccardo Cantoro, Nicolo' Bellarmino. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023


COVID-19 is an infectious disease caused by the SARS-CoV-2 virus that first emerged in Wuhan, China, in late 2019. It quickly spread globally, leading to a pandemic that has affected millions of people and resulted in numerous deaths. The peak period for COVID-19 was from 2019 to 2021, at that time the entire population of world got affected in many different ways such as Health, Economies, Businesses. To get back to normal life, all health regularities started working in their goal to control the virus. Different Health regularities such as WHO(World Health Organization) told the symptoms of COVID-19 can range from mild to severe, and can include fever, cough, fatigue, loss of taste or smell, and difficulty breathing. Also after doing scientific studies they got to know that The virus can be primarily spread through by respiratory droplets produced when an infected person talks, coughs, or sneezes. To prevent the (COVID-19)virus spread there are some precautions which need to be take in measure such as maintaining social distancing, washing hands frequently, avoiding large gatherings and wearing Mask. Also it is very necessary to identify the infected persons (Those who got positive COVID-19 test results) and make them isolate to prevent the spread of virus. To identify people who are infected with the virus(COVID-19). There are different type of test method but from them there are 2 main types of invasive-tests: diagnostic tests(PCR) and antibody tests. This type of test can become hectic, stress-full and hazardous for person, if they need to requires to do many time to detect virus. NanoTech Analysis S.r.l. (NTA) has proposed a non-invasive breath analysis as an alternative testing solution. In this method only person breath is requires. From this method Mass-Spectra is obtained from a sample of patient’s breath and the data of Mass Spectra can be utilized to detect virus using Machine Learning algorithms. The purpose of this research is to provide results on identify potential problems and provide some solutions in order to complete the process of classifying COVID-19 patients by using the dataset which is provided by NTA. While classical GC-MS techniques take too much time in analyze, NTA special mass-spectra analyzer is properly deployable in the context COVID-prediction, since this device is able to perform several acquisition of mass spectra of patients’ breath in just a few minutes. Data Analysis and Machine learning methods are used to extract relevant knowledge from the raw dataset of mass spectra acquired, for each patient. Pre-processing step are performed to make the measurements more robust to noise and outliers. Then, ML supervised model are used to find a relationship between compound extracted by the mass-spectra and the patient COVID-19 positivity. During examining the dataset there were multiple problems which were observed are: unbalance dataset, dimensionality, outliers, duplicate data, day biasness(day of acquiring the sample), etc. To resolved these problems, multiple solutions are provided such as generation of new features in dataset, synthetic data and different type of Normalization approaches applied on the dataset in this study. After resolving issues in dataset which are defined above, There were multiple supervised algorithm such as random forest, decision tree, support -vector machines (SVM),etc used in this study to identify the Positive Covid patient.

Relators: Giovanni Squillero, Riccardo Cantoro, Nicolo' Bellarmino
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 81
Additional Information: Tesi secretata. Fulltext non presente
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: NanoTech Analysis srl
URI: http://webthesis.biblio.polito.it/id/eprint/26854
Modify record (reserved for operators) Modify record (reserved for operators)