Politecnico di Torino (logo)

Data Analysis of a non-invasive breath test method for SARS-CoV-2

Federica Buccolini

Data Analysis of a non-invasive breath test method for SARS-CoV-2.

Rel. Giovanni Squillero, Riccardo Cantoro, Nicolo' Bellarmino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2022


Nowdays, the entire world is still facing the COVID-19 pandemic. Get the virus under control is the first goal in order to return to a normal life. Epidemiological studies show that the spreading of the virus is possible from an infected person’s mouth or nose liquid. It is important to identify COVID-19 clusters (a group of people positive to COVID-19 test) and isolate, in orther to avoid the virus spread. Mass screening is useful to test many people in the shortest time, to identify positive patients. Different kind of tests are available on the market, mostly known as invasive. Invasive tests can be difficult to manage due to the necessity of medichal assistants and, in addition to that, if a person needs to be testes many times it can be stressful or dangerous too. The solution, proposed by NanoTech Analysis S.r.l. (NTA), is a non-invasive breath study. The instrument permits to obtain, from a sample of the patient’s breath, a mass spectra that can be exploited using Machine Learning algorithms. In this work the dataset provided by NTA is studied and results are presented. The aim is the classification of COVID-19 patients. Machine Learning techniques are useful to analyze the dataset. Firstly the study is done in orther to know the mass spectra obtained. It is common, for a gaseous sample, to perform the Gas Chromatography Mass Spectrometry (GC-MS) analysis. This technique requires a lot of time, due to the separate evaluation of each compound present in the sample. For our purpose, a mass screening, it is not possible to use this approach due to the amount of time required. The NTA instrument is able to elaborate the gasoeus sample and show the result in 15 minutes. The output of the instrument is the mass spectra that gives a general point of view of the compounds’ quantities present in the examinated sample. Several problems are identified, (e.g. unbalance, dimensionality, bias, ...) and different solution are presented. The studied dataset is small and unbalanced. In addition to that a bias due to the day of the acquisition is identified. Different normalization approaches are presented, data augmentation techniques and generation of new samples to overcome these problems.

Relators: Giovanni Squillero, Riccardo Cantoro, Nicolo' Bellarmino
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 79
Additional Information: Tesi secretata. Fulltext non presente
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: NanoTech Analysis srl
URI: http://webthesis.biblio.polito.it/id/eprint/22677
Modify record (reserved for operators) Modify record (reserved for operators)