polito.it
Politecnico di Torino (logo)

A Model-Distributed Inference Approach for Large Language Models at the Edge

Davide Macario

A Model-Distributed Inference Approach for Large Language Models at the Edge.

Rel. Michela Meo, Erdem Koyuncu. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
Abstract:

We present an implementation of Model-Distributed Inference for Large-Language Models ("MDI-LLM"), a framework designed to deploy state-of-the-art LLMs across a network of low-power edge devices. This is achieved by partitioning the model layers into chunks, each assigned to different nodes that exchange intermediate network activations wirelessly over the air. To optimize this process, we introduce "recurrent pipeline parallelism," a technique that minimizes idle time for each device and enables parallel inference when generating multiple pieces of text. By collectively utilizing the computational resources of multiple edge devices, MDI-LLM allows the deployment of models too large to fit on a single edge device, allowing inference on cheap hardware. Also, increasing the number of cooperating devices allows MDI-LLM to increase token generation rates while reducing per-device memory consumption.

Relators: Michela Meo, Erdem Koyuncu
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 82
Subjects:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: New organization > Master science > LM-27 - TELECOMMUNICATIONS ENGINEERING
Ente in cotutela: UNIVERSITY OF ILLINOIS AT CHICAGO (STATI UNITI D'AMERICA)
Aziende collaboratrici: University of Illinois at Chicago
URI: http://webthesis.biblio.polito.it/id/eprint/31718
Modify record (reserved for operators) Modify record (reserved for operators)