Politecnico di Torino (logo)

High-level design of a Depthwise Convolution accelerator and SoC integration using ESP

Riccardo Capodicasa

High-level design of a Depthwise Convolution accelerator and SoC integration using ESP.

Rel. Mario Roberto Casu. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (15MB) | Preview

One of the hardest challenge that industry had to front in the last few years was finding a solution to understand the content of an image in a fully automatic way, this branch of study is called "Computer Vision". According to LDV Capital, the number of cameras around the world will proliferate to at least 220% or 45 billion by 2022. This impressive forecast gives one of the reasons why we need techniques to process and classify images in an effective way. From this, one of the most challenging problem in computer vision is "Object Detection", that is the capability to locate object instances inside an image. In order to solve effectively not only the object detection problem, but also an entire set of very complex problems like speech recognition, in 1950s the computer scientist John McCarthy, coined a totally new paradigm called Artificial Intelligence (AI) or "the science and engineering of creating intelligent machines that have the ability to achieve goals like humans do". The goal of this thesis is to realize an hardware accelerator that implements the Depthwise Convolution algorithm, a light-weight convolution algorithm used in Deep Neural Networks targeting mobile applications. The accelerator is coded in C++, synthesized with High Level Synthesis (HLS) using Catapult HLS and integrated in a System On Chip with a RISC-V processor using ESP (Embedded Scalable Platforms), an open source tool developed by Columbia University. ESP gives the possibility to design accelerators and to integrate them in a SoC, together with processors, memory tiles and input/output interfaces, all connected with a Network On Chip (NoC). After the design phase of the C++ code, we went through the validation, synthesis and simulation steps in order to verify the correct behavior of the Depthwise accelerator. Then we have integrated it into a complete System On Chip (SoC) using the ESP design flow. The realized SoC is composed by our accelerator tile, one memory tile, one I/O tile and one processor tile. In particular the CPU is a 64-bit Ariane RISC V soft-processor. Finally, after a preliminary simulation and validation phase in Modelsim of the complete SoC, we have implemented it into a real FPGA using a proFPGA xc7v2000t. We have tested our Depthwise Convolution baremetal application on the soft-processor. In particular we have measured the execution time of the algorithm for both general purpose CPU and dedicated hardware accelerator and the results have highlighted the differences in terms of speed between our accelerator and the general purpose soft-processor. In fact, using the same convolution parameters, the accelerator execution time is 93.12% lower than the general purpose CPU. As last step we performed a design-space exploration exploiting the flexibility of HLS to quickly change the accelerator design varying different directives. In particular, we tried to apply different architecture optimizations in order to find a Pareto set of possible solutions in the latency vs Area space, spanning from low area and high latency design to high area and low latency design. Thanks to the analysis of this thesis work, a hardware designer will easily find the right Depthwise accelerators to integrate in her ESP-based SoC that satisfies the overall area and latency constraints.

Relators: Mario Roberto Casu
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 106
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/25410
Modify record (reserved for operators) Modify record (reserved for operators)