Post-Training Quantization of a Transformer-based Autonomous Driving Neural Network

Giovanni Gaddi

Post-Training Quantization of a Transformer-based Autonomous Driving Neural Network.

Rel. Mario Roberto Casu, Edward Manca. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Share Alike.
Download (3MB) | Preview

Abstract:	The ongoing development of Autonomous Driving (AD) systems has resulted in an increased demand for perception models that combine high accuracy with computational and energy efficiency. Recent advancements include BEVFusion, a state-of-the-art multi-sensor fusion Neural Network (NN) framework that combines camera and LiDAR into a unified representation called Bird's-Eye View (BEV). This approach enables robust spatial reasoning and 3D object detection. In this scenario, BEVFusion reaches competitive performance on large-scale benchmarks such as NuScenes, a dataset for multi-modal AD NNs, providing training data from camera and LiDAR sensors with 3D object annotations. However, BEVFusion's computational and memory requirements make real-time deployment on embedded or resource-constrained devices extremely difficult despite its high accuracy. An important optimization in NN deployment is quantization. The objective of this technique is to substitute floating-point with integer arithmetic, minimizing the accuracy loss of the resulting NN. Therefore, the use of integer operations results in Quantized NN (QNN) that are lighter and easier to deploy in resource constraint scenarios. This thesis investigates the application of Post-Training Quantization (PTQ) in BEVFusion to lower its deployment requirements, without changing the model structure nor retraining it. To this end, I developed a comprehensive framework to apply PTQ, incorporating a per-module calibration strategy that allows for quantization of NN weights and activations of the most computational-intensive layers. Moreover, with the objective of minimizing the common instabilities of PTQ, I linearly vary the scale parameters to find the best of each quantized layer of this NN. Furthermore, I propose a Mixed-Precision Quantization (MPQ) exploration engine. MPQ is an established technique that tries to identify tradeoffs between the number of bits and overall accuracy for each layer of the NN independently. To explore this design-space, I propose a Genetic Algorithm (GA). GAs are a class of optimization engines well known for their scalability and complexity. The GA-based MPQ design space exploration varies the number of bits of weights and activations of each quantized layer of the NN, trying to keep the cosine similarity between the original and the quantized NN activations of each sub-module as close as possible. Experimental evaluations conducted on NuScenes metrics such as mean Average Precision (mAP) and NuScenes Detection Score (NDS) show that the suggested quantization methods preserve the fundamental perception capabilities of the original design while achieving a significant decrease in model size and computational requirements. Additionally, the study validates the benefits of quantization on state-of-the-art accademic and industry AD NNs, establishing a path to deploy these big and complex NNs into an AD edge-context.
Relatori:	Mario Roberto Casu, Edward Manca
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	72
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/38631

Modifica (riservato agli operatori)