Simone Romeo
ARISE: Integration of an Approximate and Reconfigurable Spatial Array on a RISC-V-Based Platform.
Rel. Maurizio Martina, Guido Masera, Michele Caon, Emanuele Valpreda, Flavia Guella. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2024
Abstract: |
Leveraging recent advancements in computational power, data availability, and algorithmic techniques, Convolutional Neural Networks (CNNs) have become widely adopted for processing images and audio due to their efficient handling of grid-like data. However, CNNs, and Deep Neural Networks (DNNs) in general, demand extensive computational resources and memory usage, impacting both power efficiency and hardware cost. The growing usage of CNNs on edge devices has driven the development of specialized hardware accelerators, like systolic arrays, to reduce data transfer overhead and enhance performance by efficiently handling the high parallelism and data throughput demands. To meet the resource and power constraints of IoT and edge devices, hardware-software optimization techniques like quantization and approximate computing are further employed. Quantization reduces DNN precision to decrease memory footprint and hardware requirements, while approximation further minimizes computation costs in terms of area and power, trading some accuracy for efficiency. In this context, this work proposes the integration and optimization of an approximate CNN accelerator within a RISC-V Micro-Controller Unit (MCU) through a standardized master/slave interface. Specifically, the adopted MCU, X-HEEP (eXtendible Heterogeneous Energy-Efficient Platform), is an open-source configurable platform easily extendable to support the integration of low-power edge accelerators. In relation to this work, an existing accelerator, SAURIA (Systolic Array tensor Unit for aRtificial Intelligence Acceleration), has been employed, using integer arithmetic and an array of shape 8x8. Within SAURIA, the PEs’ multiplier has been substituted with a run-time reconfigurable signed multiplier including 256 approximation levels and the possibility to select the operands’ precision. The performance of the obtained system are compared to those of the basic X-HEEP with the cv32e20 core for different convolution dimensions. The introduction of the accelerator results in a speed-up of around 584x when running a convolutional layer with 32x32 input size, 6 input channels, 3x3 kernel size, 8 output channels, and 1 for both padding and stride values. The additional approximation logic comes at the cost of about 5.7% additional area and does not impact on the critical path delay of the accelerator. On the other side, it guarantees an average power saving of around 38% in the most approximate configuration compared to an 8x8 signed exact multiplier optimized through Synopsys DesignWare library. In conclusion, by achieving substantial computational speedup and significant power savings with minimal area overhead, the proposed system is well-suited for CNN inference on resource-constrained devices. |
---|---|
Relatori: | Maurizio Martina, Guido Masera, Michele Caon, Emanuele Valpreda, Flavia Guella |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 92 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA |
Aziende collaboratrici: | Politecnico di Torino |
URI: | http://webthesis.biblio.polito.it/id/eprint/33997 |
Modifica (riservato agli operatori) |