Massimo Emiliano
Design of a Spatial Array with Run-Time Reconfigurable Approximate Processing Engines.
Rel. Guido Masera, Emanuele Valpreda, Maurizio Martina, Flavia Guella. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2023
Abstract: |
Convolutional neural networks (CNNs) have become a common solution for many Artificial Intelligence (AI) applications since they can achieve superior task accuracy compared to human experts or traditional algorithms. However, the billions of arithmetic steps of CNN processing require a computational effort and an amount of energy which do not match the tight power and area constraints of edge devices. Hence, it is common practice to perform CNN processing in data centers, with consistent overhead in terms of data movement, latency and power consumption. This work proposes the design of an accelerator featuring a reconfigurable approximate multiplier to compute 3D convolutions efficiently, making them implementable on power-constrained devices. This solution leverages approximate computation to trade task accuracy for reduced power consumption and arithmetic area. The multiplier performs 8-bit, integer multiplication and supports 11, design-time customizable, and additional 15, run-time adjustable, approximation levels. Furthermore, it can be configured on 16 different bit-widths of the result to support multiple degrees of quantization, further reducing the power consumption. The accelerator is a reconfigurable spatial array which implements an output stationary algorithm, optimizing data reuse between adjacent processing engines. The array supports 3D convolution with both 3x3 and 1x1 kernel dimensions. Furthermore, it can be reprogrammed to accomplish different strategies of loop tiling and loop unrolling. The multiplier alone and the entire array architecture have been tested and characterized at each approximation level by simulation and synthesis. The spatial array has been validated against convolutional layers of different shape and depth described in PyTorch using Adapt. The multiplier can reduce the power consumption up to 10% of the value obtained with no fixed and no run-time approximation, at the expense of an increased normalized mean absolute error of 300%. By taking advantage of the loop tiling and unrolling strategies, the proposed design speeds up the computation of convolutional layers when compared to a general purpose unit, becoming suitable as an external accelerator to be coupled to a microprocessor. Finally, by supporting several degrees of quantization and approximate computation, the array can be employed on edge devices for the inference of CNN models in IoT applications. |
---|---|
Relators: | Guido Masera, Emanuele Valpreda, Maurizio Martina, Flavia Guella |
Academic year: | 2023/24 |
Publication type: | Electronic |
Number of Pages: | 102 |
Additional Information: | Tesi secretata. Fulltext non presente |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering) |
Classe di laurea: | New organization > Master science > LM-29 - ELECTRONIC ENGINEERING |
Aziende collaboratrici: | Politecnico di Torino |
URI: | http://webthesis.biblio.polito.it/id/eprint/29313 |
Modify record (reserved for operators) |