polito.it
Politecnico di Torino (logo)

A density-based method for scalable outlier detection in large datasets

Matteo Corain

A density-based method for scalable outlier detection in large datasets.

Rel. Paolo Garza, Alessandro Campi. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (18MB) | Preview
Abstract:

DBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large.

Relatori: Paolo Garza, Alessandro Campi
Anno accademico: 2019/20
Tipo di pubblicazione: Elettronica
Numero di pagine: 102
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: UNIVERSITY OF ILLINOIS AT CHICAGO (STATI UNITI D'AMERICA)
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/15378
Modifica (riservato agli operatori) Modifica (riservato agli operatori)