polito.it
Politecnico di Torino (logo)

Design and development of intelligent algorithms for detecting Volatile Organic Compounds (VOCs) from raw data acquired through highly miniaturized devices

Alberto Ricatto

Design and development of intelligent algorithms for detecting Volatile Organic Compounds (VOCs) from raw data acquired through highly miniaturized devices.

Rel. Giovanni Squillero, Nicolo' Bellarmino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Abstract:

Helicobacter Pylori is a bacterium whose activity is the cause of ulcer in the human stomach, a well known risk factor of gastric cancer. The standardized hospital protocol for its detection is the "13C Urea Breath Test" (UBT), which consists in detecting isotope-marked carbon dioxide in a breath sample of a patient that has previously ingested a pill containing marked urea. If bacteria are present, urea gets digested producing marked carbon dioxide, which is then expired by the person. The presence of such Volatile Organic Compound within the sample is then analysed by a mass spectrometer, such as the ones provided by NanoTech Analysis S.r.l. for a lower cost with respect to the standard medical exams. The scope of this study is to help, via statistical and machine learning methods, to better understand the behaviors of such instruments in order to optimize the whole data acquisition process in terms of cost, accuracy and reliability. Data come from two different machine and two different operational modes. The first machine can take a small number of spectra in a wide range of Atomic Masses (AMUs), in order to get a good amount of different data, allowing for a "holistic" approach that could give out information about the measuring environment. In the end, this approach was ditched because of the slow acquisition speed and the too high amount of undesired instrumental effects that were affecting not only the measurement precision, but its reliability as well. The second machine presents a lot less of the unwanted behaviors observed on the first spectrometer. At first, spectra were acquired in a modality similar to the one used before, in a more narrow range of AMUs and in a higher number per patient. In the end, the company decided to change operating mode, as it allowed to get more precise measures in a shorter amount of time, making the analysis of a sample faster and the machine more competitive. At first, a database for fast and reliable data access was created starting from the raw spectra files, and several data normalization techniques were set up in order to address instrumental biases. Statistical methods like Kolmogorov-Smirnov, TOST (two one-sided), Friedman tests and bootstrap methods have been explored to estimate instrument reliability and confidence, even in combination with ML filtering techniques like Isolation Forests, Local Outlier Factor and 1C-SVM. By looking at the distribution of outlying spectra detected by the methods above, a synthetic data generation approach was created in order to find a reliable confidence measure that was not too tightly related to the final positivity score. Using a similar approach, a Cartesian product + majority voting experiment was carried out exploiting different ML classifiers; finally, LSTM models were designed for spectra treated as time series: several features construction techniques, architectures and confidence measures were explored. The very last point of the analysis focused instead on the automatic detection of broken air samples. In the end, the performed analyses helped and will go on helping to improve even further by offering insightful views about instruments and procedures, plus a foundation for further and deeper works. All the information contained in this document are intended to be strictly confidential and are proprietary to NanoTech Analysis S.r.l; any unauthorized disclosure, reproduction, or use is strictly prohibited.

Relatori: Giovanni Squillero, Nicolo' Bellarmino
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 74
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NanoTech Analysis srl
URI: http://webthesis.biblio.polito.it/id/eprint/33187
Modifica (riservato agli operatori) Modifica (riservato agli operatori)