Politecnico di Torino (logo)

A CoSimulation Framework for Assessment of Power Knobs in Deep-Learning Accelerators

Antonio Cipolletta

A CoSimulation Framework for Assessment of Power Knobs in Deep-Learning Accelerators.

Rel. Andrea Calimera. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2018

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

In the last few years, there has been a real renaissance of Machine Learning. Neural Networks and especially Deep Neural Networks have shown broad applicability from object classification and detection, to speech recognition and natural language processing. The train and inference of an NNet are mainly executed on power-hungry systems like High-Performance CPU, clusters of CPUs and/or clusters of GPGPUs. The increased computational power of nowadays systems is a key element to understand the diffusion of Deep NNets. Deep Learning Algorithms are computationally intensive and require very large memory footprint, but there are multiple advantages in moving the computation at the edge, near the sensor. For this reason, there is the actual need to design optimization flow in order to deploy NNets on resource-constrained, low power systems. Energy-oriented optimization algorithms, such as fixed-point quantization and weights pruning, have been studied in order to trade off the accuracy of the NNet with the energy consumption. Additional energy saving can be achieved through specific run-time power management strategies. The contribution of this work is a co-simulation framework which allows a rigorous study of the assessment of power knobs in spatial architecture for deep-learning accelerators. The framework can be used in the early phase of the design since it doesn’t need the complete RTL description of the accelerator microarchitecture, but only the post-synthesis netlist of the arithmetic circuits used and a software description of the dataflow in C/C++ language. The neural network front-end is fully compatible with commonly used high-level frameworks such as PyTorch, Caffe2, and TensorFlow. The interconnection network between PEs and the control unit are completely emulated in Software. All the arithmetic computations are performed through a gate-level simulation using QuestaSim by Mentor Graphics and its Foreign Language Interface. FLI routines are C functions providing procedural access to information within the HDL simulator. In this work they are used to control the simulation run and to emulate the power management. Since only the minimum hardware is emulated at gate-level, the simulation time is sustainable. However, it is possible to obtain a realistic estimation of the energy gain and the accuracy drop in case approximate computing is exploited. This co-simulator is a first attempt to the design of application specific tools in the domain of full-stack deploy of Machine Learning on Chip. The theoretical study of the AI will continue to produce challenges for hardware designers. In the context of increasing complexity of functional requirements and performance constraints, the new tool proposed in this thesis can help designers to manage cross-layer multi-objective optimization.

Relators: Andrea Calimera
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 68
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/9032
Modify record (reserved for operators) Modify record (reserved for operators)