Efficiency and Generalization in Federated Learning: Insights from Sharpness-Aware Minimization

Pietro Cagnasso

Efficiency and Generalization in Federated Learning: Insights from Sharpness-Aware Minimization.

Rel. Barbara Caputo, Debora Caldarola, Marco Ciccone. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

Abstract:	Federated Learning (FL) is a distributed Machine Learning approach that enables the training of neural networks across multiple devices (i.e., clients) owning privacy-protected data. In FL, only model parameters are exchanged during training, ensuring that sensitive user data remains protected. However, in real-world scenarios, the clients’ data often exhibits non-i.i.d. characteristics, stemming from individual behaviors and geographical locations, which can, in turn, lead to slower and unstable optimization processes. Recent research has addressed this challenge by focusing on the loss landscape, drawing from the connection between converging toward flatter minima and achieving better generalization. Specifically, it was shown how leveraging Sharpness-Aware Minimization (SAM) during local training, which aims at finding minima having both low loss value and low sharpness, enables federated global models to reach optimal results. However, these works present three main flaws: (i) SAM doubles the computations over traditional methods, which is costly for resource- and battery-constrained devices; (ii) the advanced approaches considering global sharpness also double the communication costs; (iii) these works overlook recent SAM variants that offer improved performance or cost reduction. In light of the pressing concerns surrounding climate change and the imperative to maximize resource efficiency, this thesis investigates the usage of recent SAM variants engineered for cost reduction. Our experiments reveal that cost-reducing methods not only perform as well as the original implementations but can also increase performance. Building on these findings, we propose a new optimizer, SALT, which reduces the computational cost associated with SAM while also achieving better performance and flatter minima. Furthermore, recent studies have demonstrated that optimizing for local flatness may exacerbate inconsistency across clients. To mitigate this, researchers have proposed the exchange of perturbed gradients to build a global sharpness profile. We identify two essential SAM components that can be shared across devices and leveraged to approach global sharpness minimization. Our contributions, GlobalEPS and GlobalGS, involve the exchange of either perturbation or perturbed model gradients, respectively. These parameters are averaged and transmitted alongside the aggregated model during the next communication round. Notably, both of these methods require sharing twice as many bytes between clients and servers, an issue that should not go unnoticed. To solve this flaw, we introduce sharpness-aware ascent and descent steps directly on the server-side, removing the need for sending additional information. All these approaches can coexist with the aforementioned cost-efficient, sharpness-aware local optimizers. The effectiveness of sharpness-aware methods is extensively documented in the FL literature, primarily in the context of computer vision tasks such as image classification. In addition to these established applications, we have conducted a few experiments on the Natural Language Processing task named next-character prediction. The results of these experiments demonstrated a convergence speedup, higher final accuracy, or an improvement in the metrics used to assess sharpness.
Relatori:	Barbara Caputo, Debora Caldarola, Marco Ciccone
Anno accademico:	2023/24
Tipo di pubblicazione:	Elettronica
Numero di pagine:	102
Informazioni aggiuntive:	Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/29432

Modifica (riservato agli operatori)