Politecnico di Torino (logo)

Efficient Tiling Architecture for Scalable CNN Inference: Leveraging High-Level Design and Embedded Scalable Platform (ESP)

Diego Ricardo Bueno Pacheco

Efficient Tiling Architecture for Scalable CNN Inference: Leveraging High-Level Design and Embedded Scalable Platform (ESP).

Rel. Mario Roberto Casu, Luca Urbinati. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (9MB) | Preview

High-Level Design of Hardware Accelerators of Typical DNN LIn recent years, Convolutional Neural Networks (CNNs) have gained significant prominence across a multitude of computer vision and deep learning applications, driving notable advancements in fields such as image classification, object detection, face recognition, and medical imaging. As the demand for deep learning and computer vision applications continues to rise, the pressing need for more efficient and scalable solutions to meet the computational demands of cutting-edge CNN models becomes increasingly evident. Furthermore, the necessity of extending these applications to a broader range of devices, without relying on cloud solutions, has led to a shift in computation from cloud servers to edge devices. However, this transition presents a formidable challenge due to the limited computational and memory resources inherent in edge devices. Addressing this challenge requires the utilization of hardware accelerators and the partitioning of tensors into manageable tiles that align with memory constraints. Within this context, this thesis presents an innovative tiling architecture tailored to facilitate large-scale Convolutional Neural Network (CNN) inference, with a particular emphasis on harnessing the potential of High-Level Design and the Embedded Scalable Platform (ESP). High-Level Design empowers the description of hardware using more abstract, higher-level functional representations and architectural constraints, subsequently translating into more concise and debug-friendly C++ code. ESP augments the integration design process by seamlessly incorporating the architecture into a System-on-Chip (SoC), primarily comprising at least one RISC-V processor, one or more external memory tiles, one or more accelerator tiles, and an I/O tile. ESP streamlines the design of accelerators using C/C++ in conjunction with diverse High-Level Design tools, thereby facilitating straightforward integration and testing of bare-metal software applications. This thesis introduces a tiling algorithm that takes into account several critical factors, including the organization and addressing of tensors within external memory, the maximum number of available processing elements in the accelerator design, the required precision for Multiply-Accumulate (MAC) operations (e.g., 16, 8, or 4 bits), and the memory sizes of private local memories (PLMs) within the accelerator. Each of these considerations is meticulously integrated into the algorithm to optimize performance and resource utilization. Additionally, a quantization step is incorporated into the architecture to reduce the bitwidths of multipliers and accumulators, achieving efficiency gains without a significant compromise in accuracy. The architecture is rigorously tested through RTL simulation and FPGA deployment, establishing its feasibility and effectiveness in real-world applications.ayers using ESP

Relators: Mario Roberto Casu, Luca Urbinati
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 113
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/29513
Modify record (reserved for operators) Modify record (reserved for operators)