
Michele Ferrero
Flow Scheduling for Collective Communications in Data Centers.
Rel. Fulvio Giovanni Ottavio Risso. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
![]() |
PDF (Tesi_di_laurea)
- Tesi
Accesso riservato a: Solo utenti staff fino al 11 Ottobre 2026 (data di embargo). Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (6MB) |
Abstract: |
As data and model complexity grow, centralized training encounters GPU storage and computing constraints. To address these challenges, Distributed Training (DT) has become a common approach as it leverages parallelization (e.g., of data and tensors) as well as dedicated network communication libraries, referred to as Collective Communications (CCs). However, this shift introduces new networking challenges, such as how to efficiently exchange packets between GPUs to reduce in-network congestion and how to tailor CCs to the underlying data center fabric. This thesis will first shed light on the main parallelization techniques and how they address the limitations of centralized approaches. Next, we will delve into Collective Communication Libraries (CCL), providing an overview of the most advanced existing solutions, such as NCCL and TECCL. As the complexity of these DT systems is hard to model, it is preferable to rely on a dedicated testing environment, whose implementation details will be discussed in this thesis. Finally, we will provide an overview on the opening research directions as well as the ongoing work that I'll continue addressing during my PhD. |
---|---|
Relatori: | Fulvio Giovanni Ottavio Risso |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 74 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Ente in cotutela: | INSTITUT EURECOM (FRANCIA) |
Aziende collaboratrici: | Huawei Technologies France S.A.S.U |
URI: | http://webthesis.biblio.polito.it/id/eprint/35473 |
![]() |
Modifica (riservato agli operatori) |