Federated Learning Meets Model Compression for Image Classification on Memory-Constrained Devices

Alessandro Masci

Federated Learning Meets Model Compression for Image Classification on Memory-Constrained Devices.

Rel. Alessio Sacco, Flavio Esposito, Guido Marchetto. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Abstract:	In a world flooded with electronic devices and sensors, the necessity of making them cooperate increases on a daily basis. At the same time, it is crucial to guarantee the safety and privacy of the exploited data. Furthermore, many of the devices that help us in our daily activities are compact appliances with constrained computational and storage capabilities. Therefore, it is essential to find new solutions that enhance memory efficiency without compromising accuracy. Another main aspect that characterizes the effectiveness of these devices is their speed in performing inference: the vast amount of daily-generated data demands increasingly faster inference times. Thus, looking for new solutions to merge all these necessities is crucial. One of the most exploited methodologies for enabling different devices to collaboratively enhance the reliability of artificial intelligence models is Federated Learning (FL). Thanks to this approach, it is possible to share only locally computed model weights rather than transmitting personal data across the network, preserving privacy. A strong yet useful approach for decreasing a model's size while maintaining its accuracy is the implementation of Model Compression Techniques, such as Weight Pruning: this technique enables the model to use only a selected subset of its weights to perform inference. During the forward step, the model's zero-masked weights are not considered, resulting in a decreased inference time. In this work, we have studied three different pruning strategies: Global Unstructured Pruning (GUP), Local Unstructured Pruning (LUP), and Local Structured Pruning (LSP). Each of the three pruning techniques has been evaluated with various pruning percentages: 20%, 40%, 60%, and 80%. We then analyzed how these approaches work in a FL environment with a network of 4, 8, or 12 clients arranged in a Ring-All-Reduce topology (first set of simulations) and in a Consensus-Based topology (second set). The experiments aim to investigate the behavior of several neural networks, i.e., ResNet18, ResNet50, VGG16, MobileNetV2, TinyYoloV2, and LeNet5. The experiments were run in a customized environment developed using Docker and physical GPUs accessible on Chameleon Cloud, a publicly available testbed. In particular, Docker has been used to create a local network with a tailored number of clients and to configure the network topology on a single device. The findings of this study highlight considerable differences between the pruning techniques and the possibility of indicating the best configuration settings, e.g., type of neural network, number of clients, and network topology. In the majority of scenarios, both GUP and LUP can effectively reduce model size by up to 40% without significantly losing accuracy. Conversely, LSP consistently results in minimal accuracy, even with a modest pruning percentage. The findings presented in this study offer valuable insights into the potential development and utilization of neural network models on resource-constrained devices, as well as the potential for a federated environment to mitigate the impact of model pruning. Additionally, there is interest in exploring how various pruning techniques interact with different neural networks and their impact on reducing the volume of data transmitted over the network during federated algorithms.
Relatori:	Alessio Sacco, Flavio Esposito, Guido Marchetto
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	71
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	Saint Louis University
URI:	http://webthesis.biblio.polito.it/id/eprint/33110

Modifica (riservato agli operatori)