Towards Scalable and Energy-Efficient AI/ML Hardware Accelerators

Lorenzo Ruotolo

Towards Scalable and Energy-Efficient AI/ML Hardware Accelerators.

Rel. Daniele Jahier Pagliari. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

PDF (Tesi_di_laurea) - Tesi
Accesso riservato a: Solo utenti staff fino al 13 Dicembre 2025 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (8MB)

Abstract:	Entering the sub-nanometer era of transistors, energy efficiency has become essential for high-performance hardware architectures, especially in high-end data center accelerators tasked with complex workloads like Convolutional Neural Networks (CNNs) and other Deep Learning (DL) applications. Following this trend, new architectures are emerging. One of these is Soft-SIMD Functional Units, a specific type of SIMD functional unit. This architecture supports the flexible use of low bit-width data types (as low as 3 bits), improving parallel performance in both uniformly and heterogeneously quantized (UQ and HQ) CNNs compared to hardware-based counterparts (hard-SIMD). This design also utilizes a shift-add-based Canonical Signed Digit (CSD) multiplication, which further reduces area (59.9%), and energy consumption (50.1%) compared to hard-SIMD, with only minor performance degradation. Another key architecture central to this work is Very Wide Registers (VWRs), which help mitigate the high energy cost of frequent and repetitive memory accesses in DL workloads by organizing registers in a very wide but extremely shallow (single-bit data line) single-ported memory array. Employed between large memories and local registers, VWRs act as efficient buffers, reducing the number of costly memory transfers. Their structure allows for short wire interconnections - a primary focus of the sub-nanometer era - and thus contributes to significant energy improvements compared to conventional architectures that use multi-ported register files (showing about 10x reduction over clustered VLIW register files). In this thesis, Soft-SIMD Vector Functional Units and VWRs are integrated at the RTL level into a modular datapath framework, referred to as a ‘tile’. These tiles are designed for high-throughput configurations, where multiple instances are arranged in a final potential SoC configuration. Realistic workloads, such as convolution, have been developed and mapped onto the architecture to simulate and analyze performance. Using IMEC’s advanced A10 nanosheet PDK for physical synthesis, power maps and cycle-accurate power traces were generated. Initial profiling indicates that although memory components occupy 46% of the tile area, they consume only 10% of the workload’s total power, while Soft-SIMD VFUs account for the remaining 90%. Employing thermal modeling tools like HotSpot, various multi-tile configurations have been tested to verify the absence of thermal issues in the architecture. Results show a temperature rise of only 25°C under standard operating conditions with a non-uniform power map, and 15°C with a uniform power map. The latter setup has been further tested under transient power traces from cycle-accurate simulation, providing unique insights into temperature variations. These baseline results enable early-stage system-level estimations of Power, Performance, and Area (PPA), aimed at scaling the analysis to complete workloads. Estimations suggest that running full CNN models on this architecture could yield improvements in datapath energy efficiency when compared to similar architectures, while still achieving high TOP/s throughput. Overall, although still in the early stages of design, these results show that this architecture provides a strong foundation for further refinement of the framework and workload mapping, with the potential to achieve even greater improvements in both energy efficiency and computational performance.
Relatori:	Daniele Jahier Pagliari
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	116
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela:	IMEC (BELGIO)
Aziende collaboratrici:	IMEC
URI:	http://webthesis.biblio.polito.it/id/eprint/33924

Modifica (riservato agli operatori)