polito.it
Politecnico di Torino (logo)

Integrating Design Space Exploration in Modern Compilation Toolchains for Deep Learning

Mohamed Amine Hamdi

Integrating Design Space Exploration in Modern Compilation Toolchains for Deep Learning.

Rel. Daniele Jahier Pagliari, Alessio Burrello, Matteo Risso, Francesco Daghero. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

In recent years, the rapid growth of Artificial Intelligence (AI) and the explosion of hardware devices with AI-specific features have led to a rising demand for tools and frameworks capable of translating Deep Learning models from high-level languages like Python into lower-level code optimized for a particular hardware target, often in C. This thesis focuses on edge heterogeneous systems, which have limited computational capabilities, low memory, and prioritize energy efficiency. The proliferation of diverse hardware platforms and programming ecosystems makes porting AI models to every device a non-trivial task. An ideal solution would be a universal tool that can translate high-level model representations, e.g., in Python, into code while accommodating various hardware constraints, programming languages, and interfaces. Unfortunately, achieving this without compromising performance remains challenging. For example, the TVM compiler stack is a popular open-source toolchain for deploying networks on many devices, including CPUs, GPUs, or ARM and RISC-V-based Microcontrollers (MCUs) but falls short when generating code for heterogeneous Systems-on-Chip (SoCs) containing different accelerators. An effective approach to address this challenge is TVM-BYOC (Bring Your Own Codegen), an open-source framework built on TVM, targeting AI accelerator vendors. BYOC relieves vendors from building and maintaining a full compiler stack. Instead, users can reuse TVM components and plug in optimized kernels for specific accelerator-supported layers. Vendors can then focus solely on optimizing their own kernel library to fully leverage their hardware. HTVM, for instance, follows this approach and integrates TVM with DORY, an end-to-end automatic deployment tool for MCUs, featuring a multi-level memory hierarchy and multi-level tiling via Constraint Programming. HTVM generates C code directly and offers more flexibility than vendor-specific stacks manually tuned for the hardware. This thesis builds upon this work by replacing DORY with a more flexible tool, ZigZag, in the TVM+BYOC flow. ZigZag is a Design Space Exploration tool that identifies optimal temporal mappings within a vast search space, given a definition of the target accelerator and workload. ZigZag's representation relies on loops and their ordering, and its internal memory allocator can produce unevenly mapped schedules, crucial for edge devices with limited memory. Unlike DORY, ZigZag, in its prior state, could not generate code directly. This limitation is addressed through two primary integration interfaces: the TVM-to-ZigZag interface, which exports the layer structure from TVM to ZigZag, and the ZigZag-to-TVM interface, a template for code generation that accounts for the order of loops and tiling information provided by ZigZag. With experiments on DIANA, a heterogeneous platform comprising a RISC-V core, a digital AI accelerator, and an analog one, significant improvements were achieved compared to HTVM, primarily by refining the accelerator cost model in ZigZag. More specifically, executing a set of 2D convolutional layers with varying hyper-parameters on the DIANA digital accelerator, an average performance improvement of 67% was observed compared to the existing ZigZag model. Compared to HTVM, we obtained an average speed-up of 26%, with peak improvements reaching up to 56% on the same layers.

Relatori: Daniele Jahier Pagliari, Alessio Burrello, Matteo Risso, Francesco Daghero
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 95
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/28446
Modifica (riservato agli operatori) Modifica (riservato agli operatori)