Politecnico di Torino (logo)

Towards Real World Federated Learning

Debora Caldarola

Towards Real World Federated Learning.

Rel. Barbara Caputo, Fabio Galasso, Massimiliano Mancini. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020


Federated Learning, also known as Collaborative Learning, is a relatively new Machine Learning field of study, born in 2015 to address critical matters such as data privacy, data security and data access. In this scenario, a central server model is trained exploiting data stored locally on multiple devices (i.e. the clients). Unlike the standard machine learning setting, here the model has no direct access to the data themselves: a fundamental requirement for any application where the user's privacy must be preserved (e.g. medical records, bank transactions). The server model is asynchronously sent to the clients which train it using their own local data. Then, the parameters of the locally updated models are sent back to the server and its central model is updated accordingly. In this way, the clients' data never leave their device, preserving privacy. This technique raises several challenges. Among them, federated learning models usually rely on two main assumptions. The first one is that the central model can work efficiently across several users but may not hold actually, since distinct clients might be employed on different input distributions (e.g. different speakers). The second assumption is that each client locally stores supervised data but that is easily violated, since labelling is costly and it is not possible to annotate each datum sample automatically for every task (e.g. object recognition). The aim of this thesis is to build a machine learning model in a Federated Learning scenario addressing those two issues. As for the former, we merge federated and graph learning, a deep learning technique that exploits graph-structured data as feature information. On the server side, graphs are used to model user specificity, where each node captures the characteristics of a different subset of the clients (i.e. domains). Clients and server keep training multiple models, each referring to one node of the graph, thus producing a final domain-specialist one. At test time, the new user is compared with the nodes of the graph and the initial parameters of his models are set according to the most similar domains. As for the second issue, we cope with the possible lack of supervision on the clients by using semi-supervised training objectives. All the results are compared with the current state-of-the-art Federated Learning algorithms on the LEAF benchmark.

Relators: Barbara Caputo, Fabio Galasso, Massimiliano Mancini
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 97
Additional Information: Tesi secretata. Fulltext non presente
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/15941
Modify record (reserved for operators) Modify record (reserved for operators)