Mixed-precision Quantization and Inference of MLPerf Tiny DNNs on Precision-Scalable Hardware Accelerators

Marco Alessio Terlizzi

Mixed-precision Quantization and Inference of MLPerf Tiny DNNs on Precision-Scalable Hardware Accelerators.

Rel. Mario Roberto Casu, Luca Urbinati. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Abstract

Over the past ten years, Deep Learning has made great strides with significant advancements in a variety of Artificial Intelligence (AI) applications that range from image classification to speech recognition. Nevertheless, the unprecedented performance attained by Deep Neural Networks (DNNs) comes at the cost of high computational complexity and power consumption, making them unsuitable for deployment on resource-constrained devices such as embedded hardware. As a result, a field known as TinyML has emerged, aiming to develop efficient and accurate models for the ever-growing market of Internet-of-Things (IoT) devices. Moving both training and inference to the edge offers several advantages, including enhanced data privacy, lower latency, and improved energy efficiency.

This is achieved by tackling these issues from multiple angles, such as designing networks that execute fewer operations and reducing the precision of network parameters through quantization