polito.it
Politecnico di Torino (logo)

Optimized VLSI architectures for efficient sparsity exploitation in Deep Learning

Matteo Pellassa, Michele Tomatis

Optimized VLSI architectures for efficient sparsity exploitation in Deep Learning.

Rel. Maurizio Martina. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
[img] Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (150MB)
Abstract:

Artificial intelligence (AI) nowadays plays a predominant role in many areas including robotics, computer vision for medicine, autonomous driving and much more. However, this sector's algorithms are very sophisticated and also known to be both compute and memory-intensive. Techniques to improve efficiency, reducing the number of computations without losing accuracy, are becoming critical. This thesis work focuses on the convolutional layer of Convolutional Neural Networks (CNNs) and aims to improve efficiency by avoiding useless computations. Starting from SqueezeFlow architecture, which employs PT-OS-sparse dataflow to exploit sparsity in the kernel matrices, we develop an architecture able to exploit sparsity in the input matrices and so employs PT-KS-sparse dataflow. A two-level memory hierarchy is also introduced to reduce latency and energy consumption in data retrieval. An algorithm able to avoid useless computation is developed in python and then the hardware capable of executing it is described in VHDL and validated with ModelSim simulations. The proposed accelerator supports convolution with 3x3 kernels, up to 512 input channels and matrices size up to 254x254. The circuit is synthesized with Synopsys, with 65nm technology, obtaining a netlist able to work at 670MHz; the main metrics (area, throughput, power) are retrieved to be compared with state-of-the-art accelerators. Finally, several configurations are proposed, exploiting more accelerators working in parallel to improve the system's throughput.

Relatori: Maurizio Martina
Anno accademico: 2020/21
Tipo di pubblicazione: Elettronica
Numero di pagine: 130
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/17817
Modifica (riservato agli operatori) Modifica (riservato agli operatori)