Optimizing YOLO Inference for Hardware Constraints Through Quantization Techniques

Niccolo Cacioli

Optimizing YOLO Inference for Hardware Constraints Through Quantization Techniques.

Rel. Luciano Lavagno, Teodoro Urso. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (11MB) | Preview

Abstract

This thesis investigates the application of YOLO (You Only Look Once) models for object detection tasks, with a particular focus on the quantization of such models to enable efficient deployment on edge devices and resource-constrained hardware platforms. Model quantization plays a critical role in reducing memory footprint and computational cost while aiming to preserve the accuracy and robustness of the original floating-point networks. The work focuses on integrating a complete training and evaluation pipeline, including data pre-processing compliant with widely adopted standards (e.g. YOLO) and the integration of automated tools for ground truth visualization and validation. Various training strategies were explored to enhance model performance, includ- ing hyperparameter tuning, architectural modifications, and data augmentation techniques.

A central contribution of the work is the design of a modular quantization workflow, leveraging tools compatible with ONNX and tailored for deployment with hardware-accelerated inference platforms

Relatori

Luciano Lavagno, Teodoro Urso

Anno Accademico

2024/25

Tipo di pubblicazione

Elettronica

Numero di pagine

Corso di laurea

Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)

Classe di laurea

Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA

URI

https://webthesis.biblio.polito.it/id/eprint/36358

Modifica (riservato agli operatori)