Generative AI for Real-Time Image Captioning on Embedded Neural Processing Unit

Marco Donnarumma

Generative AI for Real-Time Image Captioning on Embedded Neural Processing Unit.

Rel. Carlo Masone, Ilario Gerlero, Marcello Babbi. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Abstract

The emergence of Edge AI has opened new possibilities for deploying machine learning models on power-constrained devices, enabling real-time, private, and efficient inference directly on embedded platforms. Among these tasks, image captioning remains particularly challenging due to the computational demands of vision-language (VL) models, which typically rely on large-scale transformer architectures. This thesis addresses the gap between high-performing generative captioning and efficient edge deployment by adapting Microsoft's GIT-Base model for execution on the Hailo-8 Neural Processing Unit (NPU). We present a full pipeline for real-time image-to-text generation on an embedded system based on the i.MX8M Plus SoC and Hailo-8 NPU. To make the GIT-Base model compatible with the stringent constraints of edge hardware, we optimized both the encoder and decoder components and thoroughly reengineered the decoder architecture.

This included fixed-point quantization (INT8 and INT16), approximation of unsupported operations, and the introduction of input padding to decouple inference from data-dependent shape variability

Tipo di pubblicazione

Elettronica

URI

https://webthesis.biblio.polito.it/id/eprint/36387

Modifica (riservato agli operatori)