Politecnico di Torino (logo)

Design and Optimization of a Winograd Aware IP for Quantized Neural Networks

Davide Lezzoche

Design and Optimization of a Winograd Aware IP for Quantized Neural Networks.

Rel. Claudio Passerone, Pierpaolo Mori'. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (7MB) | Preview

Convolutional Neural Networks (CNNs) are increasingly being used in the fieldof deep learning. Among their possible applications are computer vision, speech recognition, and image classification. Nowadays, CNNs have reached very high levels of precision, at the cost of an huge amount of multiplications to perform and parameters to store. The most widely used platforms to accelerate CNNs are GPUs, which are characterized by excellent computing performance. However, their excessive power consumption does not make them the best choice for embedded applications. Instead, FPGAs represent a good compromise between throughput, flexibility, reconfigurability, and energy efficiency. Given the limited resources of FPGAs and the large computational cost and storage demand (both on-chip and off-chip) of CNNs, several optimizations are required to implement an FPGA-based CNN accelerator. Quantization, loop unrolling, and data vectorization help reduce resource demand and speed up the inference. The Winograd algorithm can be used to reduce the computational cost as well, it greatly decreases the number of multiplications required to perform a convolution, however it introduces a numerical error that is accentuated by quantization. Methods, like residue number system (RNS) representation and the introduction of the complex numbers field, that avoid or limit this error have been introduced in the literature. This project presents the design and optimization of a deeply pipelined IP for 8-bit quantized convolutional neural networks able to use the winograd algorithm to perform convolution when kernels size is 3x3. In order to reduce the numerical error introduced by the Winograd algorithm, the representations of both the complex number field and the RNS have been analyzed. The RNS solution avoids the numerical error but at the cost of a non-negligible increase in the number of multiplications, compared to the standard Winograd algorithm. Complex Winograd, on the other hand, minimizes the error with a much smaller increase in multiplications. For these reasons, the complex Winograd algorithm is the one implemented in the designed IP, achieving a reduction in multiplications of 2.57x, compared with the standard convolutional algorithm.

Relators: Claudio Passerone, Pierpaolo Mori'
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 70
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/22872
Modify record (reserved for operators) Modify record (reserved for operators)