polito.it
Politecnico di Torino (logo)

A density-based method for scalable outlier detection in large datasets

Matteo Corain

A density-based method for scalable outlier detection in large datasets.

Rel. Paolo Garza, Alessandro Campi. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (18MB) | Preview
Abstract:

DBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large.

Relators: Paolo Garza, Alessandro Campi
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 102
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Ente in cotutela: UNIVERSITY OF ILLINOIS AT CHICAGO (STATI UNITI D'AMERICA)
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/15378
Modify record (reserved for operators) Modify record (reserved for operators)