Deep Recommender Models Data Flow Optimization for AI Accelerators

Giuseppe Ruggeri

Deep Recommender Models Data Flow Optimization for AI Accelerators.

Rel. Daniele Jahier Pagliari. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB) | Preview

Abstract

Deep Learning-based Recommender Models (DLRMs) have become indispensable tools for businesses to provide effective personalized recommendations to end users. As a result, the workload introduced by these models is extremely relevant, representing, for instance, more than 79% of the AI workload in Meta’s data centers. Therefore, the optimization of such models is crucial and can lead to big energy savings, as well as increased throughput and better real-time responsiveness. State-of-the-art DLRMs present big performance limitations due to embedding layers, which project sparse categorical features to dense, continuous embedding vectors. In particular, the bottleneck is given by the large number of random memory accesses performed to retrieve a multitude of small embedding vectors from look-up tables stored in off-chip memory.

To mitigate this issue, some existing approaches exploit the large bandwidth offered by High Bandwidth Memory (HBM), while others propose to build clusters of heterogeneous nodes exploiting the advantages introduced by each platform