Francesco Camilli
Distributed AI fabrics: a network-side perspective.
Rel. Paolo Giaccone, Emilio Leonardi. Politecnico di Torino, Corso di laurea magistrale in Communications Engineering, 2026
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract
The rapid scaling of Large Language Models (LLMs) has transformed distributed training into a network-intensive workload, where communication efficiency increasingly dominates overall performance. As model size grows, the exchange of gradients between computing devices becomes a critical bottleneck, especially in geographically distributed or resource-constrained environments. This thesis investigates the impact of network latency and bandwidth constraints on decentralized LLM training, with a specific focus on the Ring All-Reduce gradient synchronization algorithm. To analyze this phenomenon, a controlled experimental environment was implemented using Docker containerization on a single physical server, where multiple nodes are interconnected through a software-defined network to emulate a decentralized topology.
The study characterizes network traffic at packet level through tcpdump and Wireshark captures
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
