A Big Data Solution for Silhouette Computation

Sara Prone

A Big Data Solution for Silhouette Computation.

Rel. Paolo Garza, Eliana Pastor. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2019

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Abstract

For data analysis, the partitioning into groups based on data characteristics is crucial. This process is called clustering and the result is a set of groups containing original data, where data in the same group are more similar to each other than to data in other groups. The clustering process only partitions objects into clusters, so at the end of the process the number of object is the same as the original, with the additional information about their division in groups. Since in real world the data sets are likely to contain a huge amount of data, in this work a way to reduce this amount maintaining most important features of data is presented.

The idea is simply to summarize the already clustered data dividing them into cells with a certain size and computing a representative object for each cell