Politecnico di Torino (logo)

Hardware Accelerator Sparsity Based for Ultra-Low Power Quantized Neural Networks based on Serial Multipliers

Eleonora Ferrara

Hardware Accelerator Sparsity Based for Ultra-Low Power Quantized Neural Networks based on Serial Multipliers.

Rel. Maurizio Martina. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

Bit-Serial Multiplier Sparsity Based for an Hardware Accelerator Nowadays, applications such as Deep Learning Enabled Internet of Things have become a new way to look to the future of monitoring, control and automation of the reality around us. In this thesis, the focus is on Edge Computing IoT, that pushes the analytic part from servers to sensors and portable devices by cutting off the need for data transmission and broad bandwidth. However, a problem has to be fixed in order to get the possibility to employ this new technology in whatever application from Industry 4.0 to medical devices, from smart home to smart cities and so on. This important issue is the huge power consuming that this kind of applications requires. In the previous work related to the Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled IoT, this aim was reached, partially, thanks to the utilization of the PULP system, elaborated from ETH of Zurich researchers. It will be introduced better further in this thesis. This open source platform supports the HWPE (Hardware Processing Engine) interface, which has been integrated as SMAC Engine yet. It exploits the cache memory hierarchy in order to reduce the latency and the power consuming related to the handling of a huge data quantity during the convolution operations typical of multilayer neural networks, such as CNN (Convolutional Neural Networks). The focus of this thesis is to get a Serial-MAC-engine (SMAC Engine), that is a 65 nm hardware accelerator with 8 or 4 bits parallelism for activations and 8, 6 or 4 bits for weights, optimized in terms of power consuming, paying attention to the trade off with the performances. The bit serial multiplication approach used before for this system respects the state-of-the-art low-power standards. The consumption of 0.58Pj/MAC only makes the power budget of the order of mW, suitable for IoT. However, there is the possibility to achieve a lower power consuming focusing on a particular issue that involves the matrices of data during the convolution process, that is Sparsity. In this way, it can be good to reduce the usage of data memory during the operations using some compression techniques that will be explained further.

Relators: Maurizio Martina
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 78
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/16665
Modify record (reserved for operators) Modify record (reserved for operators)