Politecnico di Torino (logo)

WinoAdapt: End-to-End Winograd-based FPGA accelerator for Quantized Convolutional Neural Networks

Alessandra Vignoli

WinoAdapt: End-to-End Winograd-based FPGA accelerator for Quantized Convolutional Neural Networks.

Rel. Maurizio Martina, Claudio Passerone, Guido Masera, Pierpaolo Mori', Emanuele Valpreda. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview

Convolutional Neural Networks (CNNs) are a particular kind of Neural Network (NNs) that compute the outputs by means of a convolution operation between a set of 3D inputs and a 4D tensor filter. They find application in many fields such as image processing and classification, speech recognition and object detection but their high prediction accuracy comes at the cost of high computation and memory demand and a long inference time. Many attempts have been made at identifying the best hardware support and at researching strategies to concurrently accelerate the inference of CNNs while also limiting hardware complexity and improving flexibility. FPGAs have lately been the platform of choice because they offer a good compromise between flexibility and energy efficiency; quantization has shown to reduce computation complexity while maintaining an acceptable accuracy; loop unrolling can increase the computation parallelization and speed up inference; and the number of required multiplications can be reduced thanks to computational transforms such as the Winograd Algorithm, that is able to reduce the multiplication demands of up to 4x with filters of size (4x4) or lower. Recent works have described a new version of the algorithm, referred to as complex Winograd, that makes use of complex numbers. The use of complex Winograd and data quantized on 8 bits on a layer with ifx=ify=54 and ich=och=32 and a (3x3) filter allows to achieve an output numerical error with mean of 0.36 and standard deviation of 2.98 when compared to the results of standard convolution. Coupling the algorithm with Karatsuba’s algorithm allows to reduce the number of multiplications by 3x with respect to the standard convolution. State of the art accelerators such as WRA make use of standard Winograd and of quantized data, and are able to increase the accelerator flexibility by either implementing an input decomposition paradigm or supporting standard convolution through additional hardware resources. This work presents an FPGA-based hardware accelerator for data quantized on 8-bits that exploits the complex Winograd algorithm. The design is able to support convolution in stride 1 layers with filters of size up to (4x4) and in stride 2 layers with filters of size up to (8x8) and achieves this flexibility without the need of additional hardware to support standard convolution. The main computational unit is a Processing Element based on the F(4,3) Winograd algorithm which presents higher flexibility in terms of supported kernel sizes with respect to other state of the art designs thanks to a transformation matrices reuse protocol. Tiling, loop unrolling and data reuse are also used to improve the overall performance and real time transformation of data allows to reduce the memory demands. The architecture presents two Organization elements responsible for proper input and weight decomposition and four Processing Elements working in parallel. This configuration allows to speed up the computation of stride 1 layers and to support stride 2 convolution. The hardware supports chosen for the accelerator are Xilinx ZCU102 and ZCU104 Evaluation Boards. Hardware execution shows that increasing the number of Processing Elements that work in parallel reduces the time required for the computation and that the accelerator output numerical error is comparable to the one estimated by the model.

Relators: Maurizio Martina, Claudio Passerone, Guido Masera, Pierpaolo Mori', Emanuele Valpreda
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 80
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/26730
Modify record (reserved for operators) Modify record (reserved for operators)