Politecnico di Torino (logo)

Flexible On-Chip Networks for Dynamic Dataflows on Convolutional Neural Network Accelerators

Gabriele Mario Caddeo

Flexible On-Chip Networks for Dynamic Dataflows on Convolutional Neural Network Accelerators.

Rel. Maurizio Martina. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2020


Convolutional Neural Networks (CNNs) have become the state of the art for many computer vision tasks. Their highly parallel computation graph structure offers many possibilities for optimization of hardware. While the precision and the accuracy of state-of-the-art CNN are remarkable, the deriving computational complexity and significant amount of data movement pose a challenge in terms of energy efficiency. Focussing on moving the computational complexity off-chip and reducing the off-chip↔on-chip communication shows successful results in terms of energy efficiency and throughput. Due to the deterministic nature of CNN execution, the communication does not necessarily require complex routing algorithms typically supported by Network-on-Chips (NoCs). A light and efficient on-chip interconnection can be created to support broadcast, multicast and unicast transfers between the memory and the processing elements. This Master's thesis work has been carried out during an exchange program at TUM, collaborating with BMW and Xilinx, within the on-site research team in the field of Machine Learning at BMW Autonomous Driving Campus. This thesis relies on the Output Stationary Smart Dataflow, exploiting TDM interconnections. The aim is to develop a low-power, flexible network that is able to sustain that dynamic dataflow, reducing complexity and increasing the throughtput compared to other state-of-the-art solutions. Changing the hardware constraints and the inputs shows the best implementation for throughput, latency and power. The design space exploration framework previously created by the team provided a starting point, with its outputs serving as this project’s inputs. After fixing the specifications, a protocol has been developed in order to use the interconnections in the most efficient way, depending on the inputs, supported by a mapping algorithm for the Processing Element sets and a smart division of the pixels in the on-chip memory. Finally, a handshake between the flexible Network and the Processing Elements is necessary to provide a coherent Instruction Set.

Relators: Maurizio Martina
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 102
Additional Information: Tesi secretata. Fulltext non presente
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Ente in cotutela: Technische Universität München (GERMANIA)
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/16615
Modify record (reserved for operators) Modify record (reserved for operators)