Optimization of Variational Bayes Gaussian Splatting Algorithms on Embedded GPU

Ivan Zaino

Optimization of Variational Bayes Gaussian Splatting Algorithms on Embedded GPU.

Rel. Alessio Burrello, Daniele Jahier Pagliari. Politecnico di Torino, NON SPECIFICATO, 2025

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution.
Download (7MB)

Abstract:	The deployment of demanding computer vision algorithms on embedded devices is a complicated challenge due to limited memory, computational capabilities and energy constraints. In many low-latency robotic and autonomous applications, edge deployment is indispensable, as real-time response cannot rely on remote servers. This work focuses on the optimization of the Variational Bayes Gaussian Splatting algorithm, a state-of-the-art probabilistic 3D scene modelling method originally developed for server-grade GPUs with more than 20 GBs of RAM and not previously implemented on edge platforms. We target the NVIDIA Jetson Orin Nano, an embedded GPU platform with just 8 GBs of RAM. Rather than modifying the mathematical model itself, this work aims at optimizing the existing implementation to meet the edge devices limitations and improve the overall latency. The implementation is based on JAX, a high-performance numerical computing library that enables GPU acceleration and just-in-time (JIT) compilation. JIT compilation proved to be a pivotal optimization tool: by staging large portions of the computation into fused, statically optimized kernels, we significantly reduced Python-level overhead and improved data throughput on the Nano’s limited GPU. The optimization process began with an in-depth memory and latency profiling to identify the main computational bottlenecks. The first improvement targeted a suboptimal memory allocation scheme: by avoiding unnecessary intermediate allocations during matrix multiplications, we reduced peak memory usage by up to 75%. The second major improvement was achieved through data quantization. Since the original algorithm was designed using double-precision (fp64) arithmetic for numerical stability, many operations used excessively high precision relative to their actual requirements, leading to significant overhead in both memory and computation. To address this, we developed an automatic mixed-precision search algorithm. It systematically evaluates each operation and selectively lowers its precision whenever this does not compromise output accuracy. This strategy yielded substantial gains in both training speed and memory efficiency, while maintaining output quality with minimal, and often negligible, loss. Mixed precision proved pivotal in the optimization process, boosting training speed by a factor of five while preserving output quality. Importantly, the mixed-precision search algorithm was developed in order to be generic and not tied to VBGS specifically. It provides a systematic framework for identifying precision-sensitive operations and can therefore be applied to a wide range of numerical algorithms. This makes it a valuable tool for future works aiming to deploy computationally demanding models on memory and power constrained devices.
Relatori:	Alessio Burrello, Daniele Jahier Pagliari
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	84
Soggetti:
Corso di laurea:	NON SPECIFICATO
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-25 - INGEGNERIA DELL'AUTOMAZIONE
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/37944

Modifica (riservato agli operatori)