Optimization and Quantization on Hardware Accelerators of Semantic Segmentation Neural Networks

Leonardo Rolandi

Optimization and Quantization on Hardware Accelerators of Semantic Segmentation Neural Networks.

Rel. Giuseppe Bruno Averta, Carlo Masone. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (6MB) | Preview

Abstract:	Adapting Deep Neural Networks on edge devices, including hardware accelerators as done in this thesis, is in general very challenging due to very specific low-level constraints, like the lack of available memory, the maximum layer dimension allowed and the limited type of layers and high-level architectures actually implemented on hardware. In the context of the Semantic Segmentation, given that a lot of parameters are needed to well classify every pixel of the image, the problem of the limited amount of memory is particularly emphasized. But if the transposition to the edge of the network is done properly , the overall process is worthy, because it can offer benefits such as faster performance, decreased power usage, reduced latency and enhanced parallelism. Furthermore, differently from typical cloud paradigms, it can depend much less from data traffic bandwidth limits and can be more reliable to maintain security and privacy. The initial objective of this study is to select a Semantic Segmentation architecture that is suitable for hardware adaptation, without being too large or complex, yet capable enough to perform its task effectively. After selecting the network, it is trained carefully and analyzed to identify properties that can be leveraged in the subsequent optimization and quantization phase. The focus of this work is on compressing and adapting in the best possible way the selected architecture to edge devices. Both the retraining approach with quantization awareness and especially the post-training approach are tested, with the latter one involving a guided search using a custom Genetic Algorithm to find the near-optimum quantization configurations. The results demonstrate that Deep Neural Networks contain redundant information and that, by carefully compressing and optimizing them, their effectiveness is not compromised.
Relatori:	Giuseppe Bruno Averta, Carlo Masone
Anno accademico:	2023/24
Tipo di pubblicazione:	Elettronica
Numero di pagine:	53
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	STMICROELECTRONICS srl
URI:	http://webthesis.biblio.polito.it/id/eprint/29531

Modifica (riservato agli operatori)