Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications

Yuliang Chen

Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications.

Rel. Mario Roberto Casu. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (7MB) | Preview

Abstract

Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications Thesis Title: Efficient Mixed-Precision Quantization of Deep Neural Networks for Edge Applications This thesis explores the impact of mixed-precision quantization (MPQ) on deploying deep neural networks (DNNs) in edge applications. The research aims to reduce computational complexity during inference on embedded devices by simplifying scaling factors to powers of two, enabling efficient shift operations in place of multiplications. This approach reduces computational costs and energy consumption but can narrow the quantization range, potentially affecting model performance. The study involved training various models, including MobileNetV1, MobileNetV2, an auto-encoder, EfficientNet, ResNet, and a CNN for a keyword spotting (KWS) task.

While all models performed well under MPQ, only the auto-encoder and CNN for KWS maintained good performance under flat quantization, where the same quantizer is applied across all layers

Tipo di pubblicazione

Elettronica

URI

https://webthesis.biblio.polito.it/id/eprint/32741

Modifica (riservato agli operatori)