polito.it
Politecnico di Torino (logo)

Semi-supervised Tree-based Anomaly Detection

Luca Stradiotti

Semi-supervised Tree-based Anomaly Detection.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview
Abstract:

In many real-world applications, abnormal behaviors must be detected immediately to avoid dangerous situations. Several automated approaches have been proposed that aim to analyze the data collection provided and identify critical and dangerous patterns. This task was always considered as unsupervised learning since no labeled instances were available: obtaining training labels is extremely expensive and requires a lot of time from experts who have to carefully read the data and provide the labels. However, nowadays there are often few labels available, so many semi-supervised models have been studied, which significantly improve the unsupervised performance. Semi-supervised models are divided into three categories depending on their approach. One of them is composed of tree-based models that learn how to properly classify anomalies and normal data by building an ensemble of trees. Although these models are very powerful, they are poorly studied in the literature due to the difficulty of using both unlabeled and labeled information during the tree-construction phase. Therefore, a novel semi-supervised tree-based approach is proposed in this work. The model learns from both the available labeled instances and unlabeled data to intelligently partition the space into regions to distinguish normal samples from outliers. The model is then evaluated on several benchmark datasets and its performance is compared with available state-of-the-art algorithms. Empirically the obtained results show that the proposed approach outperforms the unsupervised and semi-supervised baselines for most of the datasets used.

Relatori: Paolo Garza
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 55
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: KUL - KATHOLIEKE UNIVERSITEIT LEUVEN (BELGIO)
Aziende collaboratrici: Ku Leuven
URI: http://webthesis.biblio.polito.it/id/eprint/24688
Modifica (riservato agli operatori) Modifica (riservato agli operatori)