polito.it
Politecnico di Torino (logo)

Slimmable and Early Exit Neural Networks for Object Detection on Nano-Drones

Carlo Marra

Slimmable and Early Exit Neural Networks for Object Detection on Nano-Drones.

Rel. Daniele Jahier Pagliari, Alessio Burrello, Beatrice Alessandra Motetti. Politecnico di Torino, NON SPECIFICATO, 2025

[img] PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (43MB)
Abstract:

Deploying deep learning on nano–drone platforms imposes strict constraints on latency, memory, and energy. This thesis investigates the use of dynamic inference solutions to tackle this problem, focusing in particular on object detection, a common task in many applications such as navigation, obstacle avoidance, search and rescue, etc. The proposed approach modifies a MobileNetV2 SSDLite (input 512×512) into a so-called slimmable model, where four width configurations (0.25×, 0.5×, 0.75×, 1.0×) can be dynamically selected for layers after the 6th feature extractor block. The four widths share a single set of weights, except for width-private batch normalization statistics, thus incurring a minimal memory overhead with respect to the original model. Width selection can be performed on a per-sample basis, for example depending on external conditions, such as remaining battery life or expected task difficulty. In addition to this multi-width operation, the model also supports an adaptive mode, where a binary gate classifier is added at the end of the fixed backbone, trained to detect empty or trivial frames and trigger an early-exit strategy, further reducing computation without excessive performance loss. Training is divided into two phases. In the first phase, the slimmable detector is optimized across all widths using in-place ensemble bootstrapping with an EMA teacher and knowledge distillation. This strategy ensures stable convergence and preserves accuracy at every configuration. In the second phase, the gate is trained for adaptive inference. By learning to discriminate between empty and labeled images, the gate reduces the computation effort for the former,  by either directly returning an empty prediction, or by forcing the execution of the slimmest head, which can potentially correct false negatives.  For evaluation, we use a Cityscapes-derived detection benchmark: each 2048×1024 image is converted to COCO format and split into eight non-overlapping 512×512 tiles, with extensive photometric and geometric augmentations. As a reference, a non-slimmable MobileNetV2 SSDLite-512, which retains the same gate as the slimmable model but is fixed to the full width, achieves 21.57 mAP at a computational cost of 1.76 GMAC per forward pass. Our slimmable model spans a smooth accuracy–efficiency frontier: at 1.0× it reaches mAP 21.96 at 1.44 GMAC/forward; at 0.75× it attains mAP 14.12 at 0.94 GMAC (–34.7% vs 1.0×); and at 0.25× it yields mAP 2.09 at 0.38 GMAC (–73.6%). The adaptive mode, leveraging the trained gate, skips 1884/4000 images (47.1%), correctly flags about 57% of empty images, and maintains mAP 16.64 while reducing the average cost to 0.85 GMAC (–41.0% vs 1.0×). Overall, the proposed detector combines the flexibility of slimmable design with adaptive early-exit strategies. It allows instant switching between operating points, either user-controlled or automatically chosen, and achieves substantial compute savings in real-time drone workloads with controlled accuracy loss. Results indicate that slimmable and input-adaptive networks offer a unified and practical approach to deploying deep learning on resource-limited platforms.

Relatori: Daniele Jahier Pagliari, Alessio Burrello, Beatrice Alessandra Motetti
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 88
Soggetti:
Corso di laurea: NON SPECIFICATO
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/37877
Modifica (riservato agli operatori) Modifica (riservato agli operatori)