polito.it
Politecnico di Torino (logo)

On the Challenges of Class Imbalance in Federated Learning for Semantic Segmentation

Eros Fani'

On the Challenges of Class Imbalance in Federated Learning for Semantic Segmentation.

Rel. Barbara Caputo, Debora Caldarola, Fabio Cermelli, Antonio Tavera. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2021

Abstract:

Today we live in a hyper-connected world, where huge amounts of data are produced on a daily basis by billions of devices. Within such a scenario, developing algorithms capable of working on edge devices while preserving the privacy of the users has become of the utmost importance. Federated Learning (FL) is a novel machine learning field of study born to address data privacy, data security and data access issues. Its innovation lies in proposing a framework in which it is possible to exploit privacy-protected data, without breaking any regulation. Data is accessed only locally, while only the model is exchanged among the devices (i.e. the clients) of the network. After downloading it from a central server, each client trains it locally using its own data, and then sends back the updated parameters, which are finally aggregated on the server-side according to a chosen algorithm. Consequently, FL settings have tacit privacy and network communication advantages compared to the standard centralized learning paradigm. The world of self-driving cars presents several contact points with FL: each car is a client which has access to a large amount of privacy-preserved data, generated by the on-board cameras and sensors. Autonomous driving vehicles, for their part, need to reliably perceive the surroundings and take safe actions accordingly, avoiding to injure the driver and the passersby. Hence, Semantic Segmentation (SS) is the ideal task to focus on and the most common one in the self-driving cars sector, since it allows the car to sense the obstacles and the signals, and to know their exact spatial location. Therefore, FL provides an elegant and efficient privacy-preserving solution. In literature, none has focused on FL+SS in an autonomous driving context yet. Hence, the primary objective of this work of thesis is to propose a comprehensive benchmarking framework which combines FL and SS on domain specific datasets related to the world of self-driving cars. The first aspect that has been investigated relates to the constraints imposed by the federated scenario on the complexity of the model. In fact, it is not possible to exploit models which require too many computational resources, since we cannot expect autonomous driving vehicles to be provided with the computational resources usually employed to deal with deep learning models. Therefore, many lightweight models for SS have been compared in terms of number of trainable parameters, their memory occupation and inference speed. Then, the chosen model has been tested on both traditional and federated scenarios to highlight the difference in its behavior in learning. Two known algorithms belonging to the world of FL are introduced and compared, using two datasets for the autonomous driving sector. The simulations have been conducted by varying the settings of the federated algorithms together with the clients' data distribution, so as to simulate the possible domain shifts and the statistical heterogeneity. Finally, Federated Image Segmentation (FedSeg), an innovative method able to work on top of any FL algorithm, is proposed. Its aim is to address the class imbalance issue by perturbing their importance on the basis of the training itself. Here, as for any FL algorithm, the server aggregates these information and provide them to the clients in the next round. This method has shown an increase in the performances in the early stages of training for the most unbalanced settings.

Relators: Barbara Caputo, Debora Caldarola, Fabio Cermelli, Antonio Tavera
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 108
Additional Information: Tesi secretata. Fulltext non presente
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/20566
Modify record (reserved for operators) Modify record (reserved for operators)