Inference optimization of Large Language Models on RISC-V HPC platforms

Javier Jesus Poveda Rodrigo

Inference optimization of Large Language Models on RISC-V HPC platforms.

Rel. Daniele Jahier Pagliari, Mohamed Amine Hamdi, Alessio Burrello. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (9MB) | Preview

Abstract

Over the past decade, there have been significant improvements in Artificial Intelligence (AI), particularly in the area of natural language processing (NLP), thanks to the emergence of Transformers and, more in general, of Large Language Models (LLMs). These models have enabled numerous deep-learning applications such as translation, text generation, image generation, and many others. However, these transformer-based models present new challenges because of their computationally intensive attention mechanisms and extremely high memory footprint. Even though these types of workloads are typically offloaded to GPUs, there are applications and use cases that require CPU as the workhorse because of its reduced cost and greater flexibility.

For instance, while training is too burdensome for CPUs environments, CPUs are suitable for single-example or even batched inference