Davide Buoso
Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval.
Rel. Giuseppe Bruno Averta, Philipp Thorr, Daniele De Martini, Tim Franzmeyer. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (26MB) | Preview |
Abstract
This study explores the potential of off-the-shelf Vision-Language Models (VLMs) for high-level robot planning in the context of autonomous navigation. We introduce a straightforward approach, Select2Plan (S2P), a novel, zero-training framework for high-level robot planning that eliminates the need for fine-tuning or specialized training. By leveraging structured Visual Question-Answering (VQA) and In-Context Learning (ICL), our approach significantly reduces the need for data collection, requiring a minimal fraction (around 0.0005%) of the data typically used by trained models or even just using videos from the internet. Our methodology facilitates the effective use of a generally trained VLM in a flexible and cost-efficient way and does not require additional sensing to a monocular camera.
We demonstrate its adaptability across various scene types, context sources, and even setups
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Ente in cotutela
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
