Quantifying the figures of merit of MAC architectures for Deep Learning Accelerators

Ignacio Goldman

Quantifying the figures of merit of MAC architectures for Deep Learning Accelerators.

Rel. Andrea Calimera. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2018

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB) | Preview

Abstract:	Artificial intelligence is moving ahead at a staggering speed in applications and is spreading rapidly in many aspects of daily life such as face and gesture recognition, vision, autonomous cars, remote sensing and robots, agriculture, augmented reality, and bio-metrics, just to name a few. The potential is even greater since modern approaches of artificial intelligence, such as Machine Learning or Deep Learning, can be applied onto smaller devices such as smartphones or even smaller ones like embedded systems with severe performance constraints. One of the main problems of these new approach to artificial intelligence is the resource usage. Convolutional Neural Networks (CNNs), for instance, need high amounts of data to work, thus implying heavyweight computations during the training phase, as well as during inference stages. For these reasons, many companies and research groups are working on new dedicated hardware solutions for accelerating CNN operations. In particular, NVIDIA has released, and it is still working on, an open source architecture of a CNN accelerator called NVDLA. This architecture has some interesting points such as that a convolution pipeline working with 16-bits floating point operations (also referred to as the full-precision implementation), and a high reconfigurability and modularity. Indeed, modules and cores are fully-independent between them, so they can be removed, replaced, or modified as needed. Taking advantage of those characteristics, the main objective of this thesis is to analyze and compare the figures of merit of the NVDLA architecture under different working conditions. Tested configurations include a full-precision 16-bit, and a reduced 8-bit implementation. Comparisons have been carried out in terms of area, power, and speed for each configuration. More in detail, the investigation has been done taking into account the most important component of a CNN, the convolutional module, where more than 90% of the operations are represented by matrix-vector multiplications, or multiply-and-accumulate (MAC) operations. Therefore, to improve the yields of the NVDLA, two inaccurate multiplier architectures geared towards efficient mathematical operations were also included in the analyses. Preliminary results suggest that although inaccurate multipliers introduce errors in MAC operations, this error is not sufficient enough to affect the prediction results. In other words, trading accuracy for area and power saving is possible and the prediction accuracy does not vary abruptly.
Relatori:	Andrea Calimera
Anno accademico:	2017/18
Tipo di pubblicazione:	Elettronica
Numero di pagine:	97
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/18662

Modifica (riservato agli operatori)