Politecnico di Torino (logo)

Exploring association of several variables using mutual information

Alessandra Serra

Exploring association of several variables using mutual information.

Rel. Mauro Gasparini. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2018

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview

This work focuses on methods of data exploration using the mutual information and other related information measures. In particular, the author proposes a method to discover pairwise correlations among variables and to classify them into clusters. The master thesis presents the work done by the author during her internship in Tetra Pak. The core products of the company are the filling machines. The performance of a machine which fills shelf-stable food packages is called the aseptic performance and is defined by the long-run ratio between the number of not commercially sterile packages and the total number of packages filled by the machine. Nowadays, Tetra Pak collects a large amount of data in order to improve the aseptic performance. The dependencies among variables, in real-world applications like the aseptic performance characterisation, are often unknown and they are almost always characterized by nonlinear relationships. The aim of this study was to find a way to discover correlations among continuous and categorical variables in large datasets. A popular statistic in data mining is a measure of dependence. In order to deal with a large amount of variables that could have non-linear dependencies, an adequate measure is required. The idea is to explore datasets with both continuous and categorical variables and to group them into clusters using a distance based on the mutual information. This measure of dependence is well-established in information theory and it can be used to have a better understanding of the relationships among the features. The principal use of the proposed method is to find a set of uncorrelated variables in order to build predictive models and explain variables of interest.

Relators: Mauro Gasparini
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 70
Corso di laurea: Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Aziende collaboratrici: Tetra Pak Packagin Solutions S.p.A.
URI: http://webthesis.biblio.polito.it/id/eprint/8360
Modify record (reserved for operators) Modify record (reserved for operators)