polito.it
Politecnico di Torino (logo)

A Python-based Hardware Generation Framework for Tensor Systolic Accelerators

Nicole Dai Pra

A Python-based Hardware Generation Framework for Tensor Systolic Accelerators.

Rel. Andrea Calimera. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview
Abstract:

Accelerating Deep Neural Networks (DNN) with custom hardware represents an attractive solution to meet stringent applications constraints, especially in mobile/IoT inference scenarios where energy and area efficiency are crucial. Custom hardware is commonly implemented using an iterative process during which the designers identify the main computational and memory patterns of DNN workloads, implement specific hardware structures, and assess the end-to-end performance. As new classes of DNNs are constantly developed and novel reconfigurable platforms, like FPGAs and CGRAs, allow the silicon to be customized after fabrication, agile automation tools are needed to quickly navigate the design space. To this end, in this work, a Python-based framework is proposed to generate tensor systolic arrays, a class of accelerators widely used to perform matrix multiplication, a key operation in DNN workloads. The proposed framework leverages the metaprogramming capabilities of an HDL embedded in Python to minimize the design and verification effort. In fact, smart systolic array templates allow the user to focus on designing and verifying new processing elements, leaving the burden of creating the routing fabric, the control unit, and the integration tests to the generation framework. The proposed framework is used to perform a design space exploration on the Zynq Ultrascale+ MPSoC ZCU104 Evaluation Board, assessing the effect of several knobs, namely, array size, data bitwidth, PE structure, and sparsity support, on area occupation, power consumption, and latency. The obtained results reveal non-trivial trade-offs, motivating the need for such agile design tools to keep raising the efficiency of domain-specific accelerators.

Relatori: Andrea Calimera
Anno accademico: 2021/22
Tipo di pubblicazione: Elettronica
Numero di pagine: 85
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/21131
Modifica (riservato agli operatori) Modifica (riservato agli operatori)