polito.it
Politecnico di Torino (logo)

Design and Implementation of a privacy-preserving framework for Machine Learning

Giovanni Camarda

Design and Implementation of a privacy-preserving framework for Machine Learning.

Rel. Marco Mellia, Martino Trevisan, Nikhil Jha. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

During the last decade, a myriad of new technologies has changed the way society perceives everyday life, embodying the Big Data Era peculiarities. Almost every technological scenario produces an incredible amount of data, from disparate physical sources and at a very different generation rate, creating an interconnected and interdependent network of people and data. For this reason, data has become for companies and organizations a strategical asset to drive businesses, to tailor user-specific services and to obtain a more relevant position on data markets. More and more companies collect and process customers personal data requiring it in exchange for services, forcing users to accept a power unbalanced transaction. To tackle this situation, regulations as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) were signed in 2018 and 2020, enforcing data protection respectively in the European Union and California State: their primary goal consists in support free data flow, building trust conditions and rebalancing powers in the relationship between companies and customers. In this context legal frameworks are necessary but not sufficient, since the absence of an international standard to technically implement data protection in data processing activities is a serious obstacle for companies. The European project PIMCity aims to narrow the gap between regulations and practical privacy-preserving solutions providing a modulable framework thanks to which companies can implement ad-hoc instruments. The PIMCity project provides four interoperating components: the Personal Data Safe (P-DS) to store data from various sources, the Personal Privacy-Preserving Analytics (P-PPA) with which is possible to extract information preserving privacy, the Personal Consent Manager (P-CM) that models the user consent notion and the Personal Privacy Metrics (P-PM) to enhance users’ awareness about their shared data. This thesis presents a generic fully-fledged P-PPA module whose input data can be both in a structured and unstructured format. The project pipeline was developed in Python language, providing REST API to interact with it and exploiting privacy properties as K-anonymity and Differential Privacy. This module is used as the starting point to define a Machine Learning framework to analyze the amount of information gathered from anonymized data. I further propose a deeper inquiry to investigate the correlation between the increasing privacy constraints and the residual information level of the Machine Learning algorithms output.

Relatori: Marco Mellia, Martino Trevisan, Nikhil Jha
Anno accademico: 2020/21
Tipo di pubblicazione: Elettronica
Numero di pagine: 95
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/18105
Modifica (riservato agli operatori) Modifica (riservato agli operatori)