polito.it
Politecnico di Torino (logo)

Machine Learning for automatic assessment of the risk related to web tracking

Marzia Maffei

Machine Learning for automatic assessment of the risk related to web tracking.

Rel. Marco Mellia, Martino Trevisan, Luca Vassio. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

This work aims at understanding today's tracking ecosystem and using machine learning tools to automatically assess the risk connected to web trackers and assigning to websites a risk indicator score. The web is a highly dynamic ecosystem and each user browses dozens of websites everyday, encountering a large number of trackers. Trackers can be more or less malicious, collecting different kinds and different amounts of data in order to build user profiles, and users are often unaware of their presence. Assigning a risk indicator to websites would make users better aware of the whole web ecosystem and would improve the user's experience as a first step toward a better protection of their data. In this thesis, machine learning algorithms are used to perform two different classifications: the first one to separate first party web domains from generic third party ones, and the second one to separate tracker domains from all the other domains. A set of features extracted from HTTP requests are used for these two classifications. After this, a risk indicator score is assigned to first party websites depending on the number of trackers contacted and their estimated pervasiveness. Trackers with high pervasiveness appear on several different web pages and are therefore capable of collecting an higher amount of data, which means that they are more dangerous for the user's privacy. The results of this work, both from the classification part and from the risk indicator score assignment, give a picture of the web itself and of its tracking ecosystem, showing how much trackers are present, even if they often are unnoticed by users in everyday activities.

Relatori: Marco Mellia, Martino Trevisan, Luca Vassio
Anno accademico: 2020/21
Tipo di pubblicazione: Elettronica
Numero di pagine: 49
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/15992
Modifica (riservato agli operatori) Modifica (riservato agli operatori)