Politecnico di Torino (logo)

Architectural design of a configurable hardware accelerator for neural network processing.

Francesco Vaiana

Architectural design of a configurable hardware accelerator for neural network processing.

Rel. Andrea Calimera. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2018


Machine learning applications have become widespread over many technological fields. Their application ranges from pattern recognition, like image classification or data mining techniques, to complex human interaction application, like autonomous driving, natural language processing or robotics. A large class of machine learning algorithms are deep neural networks. They are composed by a cascade of several non-linear layers, such as convolutional, activation, down-sampling, and classification layers. The intensive amount of data processing needed to perform inference makes neural network processing not suitable to run over standard processor architectures. This have motivated the research to develop ASIC accelerators that, with a high-parallelism spatial architecture, are able to process multiple data in a more efficient way, enabling neural network processing beside general purpose computing systems. The goal of this thesis is to design a hardware accelerator for the inference process that enables new kinds of optimization by splitting the neurons array in two different ones, an array of multipliers and an array of accumulators. Each of them belongs to different clock domains, and they are linked by a particular network able to synchronize and dispatch data according to proper configuration. So, the proposed architecture is composed by a global buffer, a memory controller and a processing element. The first one is a large scratchpad memory which stores either activation inputs, and weights, as well as computed partial sums. The memory controller is designed for managing the configuration phase and the data movement both from the buffer to the processing element and IO transfers. The processing element is the computational core of the accelerator. It is composed by an array of multipliers, a link network, and an array of accumulators. The role of the multipliers is to perform parallel multiplications between inputs. They receive inputs data from the global buffer and control signals to execute their computation. The link network, designed as a binary tree, is able to route each incoming convolutional term to the designed accumulator, with an increasing working frequency every two breadth level. This is a trade-off between the number of clocks to be generated and routed on the circuit and the provided bandwidth for convolutional accumulation. The network uses local memories, designed as FIFO (First In First Out) queues, to store incoming data in case of network congestion. Finally, virtual neurons are logically mapped above the accumulators to store the convolutional results inside a local private memory. The design has been developed in a parametric implementation in the number of multipliers, accumulators, data bit-width, buffer, FIFO and addresses size, in order to perform a Pareto optimality analysis. It has been validated with workloads obtained from AlexNet, a convolutional neural network, both from convolutional layers and fully connected layers, and it has been synthetized over a 45nm transistor library to extract area, timing and power features of the design. The keywords focused during the design phase were programmability and flexibility of the dataflow that can be mapped over this hardware accelerator. These features take into account the constant evolution of the neural network shape that will be experienced in the next years, having the possibility to map an arbitrary computational graph, given by a certain dataflow.

Relators: Andrea Calimera
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 62
Additional Information: Tesi secretata. Fulltext non presente
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/9819
Modify record (reserved for operators) Modify record (reserved for operators)