Deploying Run-Time Adaptive Binarized Neural Network in Programmable Data Planes

Simone Geraci

Deploying Run-Time Adaptive Binarized Neural Network in Programmable Data Planes.

Rel. Alessio Sacco, Guido Marchetto, Flavio Esposito. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (10MB) | Preview

Abstract:	Switches and other network devices process data at wire rate, meaning they can handle packets at the maximum capacity of the data-link connection. Modern switches separate functionality into two layers: the control plane for slower, high-level decisions (e.g., forwarding tables) and the data plane, which is the hardware-accelerated path through which each packet is actually processed (e.g., port forwarding). In recent years, with the rise of Programmable Data Planes (PDPs), a major research trend has explored how deep neural network (DNN) models can be leveraged to address long-standing network challenges (e.g., flow classification, anomaly detection) by deploying deep learning models within PDPs. However, deploying DNNs directly on PDPs is challenging due to limited memory and computational resources, the lack of support for neural network–oriented operations, and the need to maintain line-rate packet processing speed. Our work introduces an innovative split-inference architecture that addresses key challenges found in existing in-network deep learning approaches. We focus on the anomaly detection use case, where the objective is to classify network flows as benign or malicious using flow statistics as features. We propose an inference framework that integrates two different DNNs: a Binarized Neural Network (BNN) deployed entirely in the data plane, and a more complex high-precision model operating in the control plane. The two models are linked through a fused training strategy based on Knowledge Distillation (KD). The quantized model is trained using both the ground truth and the full-precision model’s predictions. In this way, we “guide” the binarized model to mimic the behavior of a deeper, denser network. Then, during the inference phase we selected critical samples based on an in-network confidence score and the most relevant flow features according to recent traffic; both of them fed an adaptive learning mechanism that continuously refines the in-switch model from the control plane. Our solution adapts dynamically to evolving conditions. This prevents accuracy degradation and facilitates long-term performance improvements in dynamic environments. This thesis will provide comprehensive documentation of the key aspects of in-network machine learning, detailing all the implementation, architectural decisions, and obtained results. We anticipate that the overall classification performance gap between BNNs and DNNs was not greater than a few percentage points in favor of the latter; however, BNNs outperformed DNNs in terms of CPU efficiency and memory consumption. The programmable switch architecture we targeted was Intel’s Tofino ASIC, one of the fastest switches on the market. Achieving a complete forward pass in a single packet traversal was not feasible; to address this limitation, we exploited the recirculation and mirroring mechanisms provided by the device. Our evaluations indicated that, in some cases, combining knowledge distillation with quantization-aware training led to faster convergence and improved accuracy. Under realistic and dynamic traffic conditions, our system demonstrated strong adaptability to distribution shifts, owing to the implemented refinement mechanism. Starting from these results, the work can be extended in several directions. For example, expanding the bit width of the quantized neural network can effectively increase its performance.
Relatori:	Alessio Sacco, Guido Marchetto, Flavio Esposito
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	78
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela:	Saint Louis University (STATI UNITI D'AMERICA)
Aziende collaboratrici:	Saint Louis University
URI:	http://webthesis.biblio.polito.it/id/eprint/37664

Modifica (riservato agli operatori)