Politecnico di Torino (logo)

Optimizing Off-Chip Data Movement Using Layer Fusion and Loop Blocking Strategies

Emanuele Valpreda

Optimizing Off-Chip Data Movement Using Layer Fusion and Loop Blocking Strategies.

Rel. Maurizio Martina. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2019


Convolutional Neural Networks (CNNs) are currently the widely adopted approach for computer vision tasks. A rapidly growing use case scenario is represented by IoT devices embedded in battery-powered systems, such as autonomous vehicles, where low power scheduling and optimization of the memory usage is recommended due to the high communication demands of convolutional layers. However, selecting an energy-efficient scheduling for a CNN is challenging and requires an extensive search of loop schedules. To tackle this problem, a design space exploration framework was developed to optimize CNN networks proposed in literature, which provides communication schedules and memory statistics such as energy and bandwidth usage. The generated hardware metrics could be used for a future co-design optimization approach of both CNNs and hardware, for instance by measuring the trade-off between gained energy efficiency and loss in accuracy after pruning or a change of the quantization. The scheduling is carried out starting from a high-level hardware description of a CNN accelerator and a target CNN model. The implemented framework exploits loop optimization techniques and novel inter-layer tiling and reuse strategies to efficiently determine the communication patterns for optimal memory usage across different layers, using as feedback the resulting dataflow mapping to optimize the overall energy. The sequential scheduling framework was validated against state-of-the-art tools on a set of different hardware configurations with AlexNet. The proposed fusion framework was used to generate a schedule for the first two convolutional layers of VGG16 on a row-stationary array with 256kB of on-chip memory, outperforming on the same hardware configuration a sequential schedule generated with loop blocking techniques, reducing the overall energy by 32%.

Relators: Maurizio Martina
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 102
Additional Information: Tesi secretata. Fulltext non presente
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Ente in cotutela: Technische Universitat Munchen (GERMANIA)
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/13242
Modify record (reserved for operators) Modify record (reserved for operators)