polito.it
Politecnico di Torino (logo)

User Behaviour Classification by means of Unsupervised Learning optimized by DNN

Alessandro Bonifazi

User Behaviour Classification by means of Unsupervised Learning optimized by DNN.

Rel. Fulvio Giovanni Ottavio Risso. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2019

[img] PDF (Tesi_di_laurea) - Tesi
Restricted to: Repository staff only until 18 June 2021 (embargo date).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB)
Abstract:

In the last years, Machine Learning has been enjoying a novel surge of use in applications and problems from different domains, leveraging its power and allowing automation in various situations. This is for sure because of the big increase in data availability, in better computational power and in the improvement of Machine Learning techniques. [...] Computer networking can also benefit from this technology [...]. The aim is to study how the users exchange information (and which kind of information) through the network and find out a solid methodology in order to cluster local area network (LAN) users that share similar behaviour, allowing to exploit this discovery for resource management and anomalous activities identification. The main challenges of a developer that is going to face a problem of this kind are: - which dataset should be used to train the model; which are the most meaningful data features that needs to be extracted and how they should be represented; - which is the best machine learning approach to follow and finally which is the correct algorithm to exploit. This work starts with capturing traffic in the interested LAN; the exploitation of a dataset obtained in this way, instead of relying on a public one, allows to make use of data that models perfectly the real behaviours of the LAN hosts. After the data cleaning phase, from the network flows are extracted the relevant features that are used to train the model. In this project two different set of features are analysed: the former is based on the flow IP destination distribution for each host while the latter exploits the per-user distribution of the kind of service being carried by the flow. In particular, the users IP destination distributions are represented in a peculiar matrix format. These different features are used to train different times a novel clustering model that was developed for the sake of this project. This unsupervised model starts with the aid of a deep neural network which is trained in a supervised manner in order to minimize a target function that symbolize the degree of users’ similarity. After the neural network training, and the target function convergence, it is possible to build a confusion matrix, which represents the distributions of the neural network predictions for each host. This confusion matrix is exploited to cluster together users that have been forecasted to have a similar prediction distribution; the similarity metric being used is very simple and consists in the difference between two users’ prediction distributions. What is obtained from this analysis is a hierarchical clustering model, with a peculiar dendrogram, where a similarity threshold drives the criterion for a user to be considered inside or outside a cluster. What emerges from this study is that the clustering results between the IP destinations case and the service categories case have remarkable differences. The clustering configurations show how it is possible to predict the user’s demand and how this demand is shared among them, allowing to inspect it in different levels of details. The fact that the IP-based and the service-based clustering show different results gives the opportunity to obtain information from different perspectives. Instead, if the wish is to consider them both at the same time, it is possible to look for the most similar clustering configuration between the two results.

Relators: Fulvio Giovanni Ottavio Risso
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 105
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Ente in cotutela: Starflow S.L. (SPAGNA)
Aziende collaboratrici: Starflow S.L.
URI: http://webthesis.biblio.polito.it/id/eprint/13127
Modify record (reserved for operators) Modify record (reserved for operators)