Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications

Yuliang Chen

Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications.

Rel. Mario Roberto Casu. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (7MB) | Preview

Abstract:	Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications Thesis Title: Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications This thesis explores the impact of mixed-precision quantization (MPQ) on deploying deep neural networks (DNNs) in edge applications. The research aims to reduce computational complexity during inference on embedded devices by simplifying scaling factors to powers of two, enabling efficient shift operations in place of multiplications. This approach reduces computational costs and energy consumption but can narrow the quantization range, potentially affecting model performance. The study involved training various models, including MobileNetV1, MobileNetV2, an auto-encoder, EfficientNet, ResNet, and a CNN for a keyword spotting (KWS) task. While all models performed well under MPQ, only the auto-encoder and CNN for KWS maintained good performance under flat quantization, where the same quantizer is applied across all layers. To implement MPQ, the quantization process was based on QKeras, where a new function supporting powers of two (po2) scaling factors was added, building on top of previous work. Bayesian optimization via AutoQKeras was employed to search for the best quantizer configurations for each layer, allowing for efficient quantization tailored to the specific structure of each model. The results indicate that MPQ achieved a 1-3% reduction in accuracy compared to traditional floating-point training, while significantly improving computational efficiency. However, models using flat quantization with uniform quantizers across layers performed poorly, particularly those other than the auto-encoder and CNN for KWS, highlighting the challenges of training with this approach. In conclusion, MPQ presents a practical solution for reducing computational load with minimal accuracy loss, making it highly suitable for edge computing. Future work will need to focus on overcoming the limitations of flat quantization and further refining MPQ techniques for even better performance.
Relatori:	Mario Roberto Casu
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	83
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/32741

Modifica (riservato agli operatori)