A Machine Learning Approach to Optimizing CNN Deployment on Tile-Based Systems-on-Chip

William Baisi

A Machine Learning Approach to Optimizing CNN Deployment on Tile-Based Systems-on-Chip.

Rel. Mario Roberto Casu, Luca Carloni. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (14MB) | Preview

Abstract:	Convolutional Neural Networks (CNNs) play a crucial role in many AI applications, such as image recognition and classification. Efficient execution of CNNs on hardware accelerators is critical, particularly in edge computing, where performance, power efficiency, and real-time constraints must be balanced due to limited resources and strict power budgets. This thesis presents an optimization framework for deploying CNN inference tasks on tile-based System-on-Chip (SoC) architectures. The study investigates various hardware configurations, including multiple accelerator tiles, memory bandwidth, computational capabilities, and on-chip local memory capacity, along with different parallelization strategies to efficiently distribute the CNN workload. The experiments were conducted leveraging the Embedded Scalable Platform (ESP), an open-source, tile-based SoC architecture for heterogeneous computing. ESP allows for the integration of custom accelerators connected through a Network-on-Chip (NoC) and provides an automated flow to prototype designs on FPGAs, enabling efficient evaluation of different SoC configurations with various software applications. CNNs exhibit significant variability in complexity across layers. For example, the memory footprint, the ratio of input feature maps (ifmaps) to weight parameters, and the computational intensity can vary substantially between layers. This heterogeneity, combined with the configurable nature of tiled architectures, introduces several trade-offs when optimizing deployment. Each CNN benefits from an optimal selection and distribution of on-chip resources and each layer in the network requires custom resource mapping to achieve optimal performance, making it challenging to determine the best resource allocation and mapping. To address this, a dataset was collected from extensive FPGA experiments, capturing the execution latency of CNN inference tasks across different SoC configurations and mapping strategies. While heuristics could help find optimal mappings, this thesis adopts a Machine Learning (ML) approach, using models trained on empirical performance data to predict optimal mappings. Such models enabled the identification of complex relationships between hardware configurations, CNN topologies, and parallelization schemes that traditional heuristics may overlook. Models such as Random Forest and Extreme Gradient Boosting were trained to predict the execution latency of CNN layers mapped onto a given hardware instance. These models were then integrated into a mapping tool designed to select optimal configurations for executing CNN layers on the target SoC. Once trained, these models can generalize to networks with characteristics similar to those in the training set, reducing the need for profiling new networks and speeding up the deployment process. In conclusion, this thesis demonstrates that ML models trained on empirical data can optimize CNN deployment on tile-based SoCs, eliminating the need for complex system models or heuristics. By leveraging ML and ESP's automated flows, this work enables more efficient CNN deployment in edge computing environments.
Relatori:	Mario Roberto Casu, Luca Carloni
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	95
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici:	Columbia University
URI:	http://webthesis.biblio.polito.it/id/eprint/33035

Modifica (riservato agli operatori)