Design and implementation of a deployment tool for modular DNN inference using ZeroMQ-based GPU-aware communication

Dario Antonio Ruta

Design and implementation of a deployment tool for modular DNN inference using ZeroMQ-based GPU-aware communication.

Rel. Carla Fabiana Chiasserini, Corrado Puligheddu. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (8MB) | Preview

Abstract

Deep Neural Networks (DNNs) are the fundamental structure adopted to provide smart services in a wide range of AI applications. However, DNN-based tasks have high computing requirements, posing huge challenges on their deployment on small and resource constrained devices such as mobile phones or IoT devices. To address this issue, some solutions consider model compression techniques to limit the computational burden of the device as well as the model memory footprint. Other strategies involve partial or full task offloading towards more powerful computing platforms placed at the edge of new-generation mobile networks (5G-MEC), ensuring low latency and near-zero computing cost on the mobile device.

In such scenario, mobile devices can consider DNN tasks as on-demand services