Alessandra Serra
Exploring association of several variables using mutual information.
Rel. Mauro Gasparini. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2018
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) | Preview |
Abstract: |
This work focuses on methods of data exploration using the mutual information and other related information measures. In particular, the author proposes a method to discover pairwise correlations among variables and to classify them into clusters. The master thesis presents the work done by the author during her internship in Tetra Pak. The core products of the company are the filling machines. The performance of a machine which fills shelf-stable food packages is called the aseptic performance and is defined by the long-run ratio between the number of not commercially sterile packages and the total number of packages filled by the machine. Nowadays, Tetra Pak collects a large amount of data in order to improve the aseptic performance. The dependencies among variables, in real-world applications like the aseptic performance characterisation, are often unknown and they are almost always characterized by nonlinear relationships. The aim of this study was to find a way to discover correlations among continuous and categorical variables in large datasets. A popular statistic in data mining is a measure of dependence. In order to deal with a large amount of variables that could have non-linear dependencies, an adequate measure is required. The idea is to explore datasets with both continuous and categorical variables and to group them into clusters using a distance based on the mutual information. This measure of dependence is well-established in information theory and it can be used to have a better understanding of the relationships among the features. The principal use of the proposed method is to find a set of uncorrelated variables in order to build predictive models and explain variables of interest. |
---|---|
Relatori: | Mauro Gasparini |
Anno accademico: | 2018/19 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 70 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Matematica |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA |
Aziende collaboratrici: | Tetra Pak Packagin Solutions S.p.A. |
URI: | http://webthesis.biblio.polito.it/id/eprint/8360 |
Modifica (riservato agli operatori) |