Matteo Corain
A density-based method for scalable outlier detection in large datasets.
Rel. Paolo Garza, Alessandro Campi. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (18MB) | Preview |
Abstract: |
DBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large. |
---|---|
Relatori: | Paolo Garza, Alessandro Campi |
Anno accademico: | 2019/20 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 102 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Ente in cotutela: | UNIVERSITY OF ILLINOIS AT CHICAGO (STATI UNITI D'AMERICA) |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/15378 |
Modifica (riservato agli operatori) |