Davide Buoso
Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval.
Rel. Giuseppe Bruno Averta, Philipp Thorr, Daniele De Martini, Tim Franzmeyer. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (26MB) | Preview |
Abstract: |
This study explores the potential of off-the-shelf Vision-Language Models (VLMs) for high-level robot planning in the context of autonomous navigation. We introduce a straightforward approach, Select2Plan (S2P), a novel, zero-training framework for high-level robot planning that eliminates the need for fine-tuning or specialized training. By leveraging structured Visual Question-Answering (VQA) and In-Context Learning (ICL), our approach significantly reduces the need for data collection, requiring a minimal fraction (around 0.0005%) of the data typically used by trained models or even just using videos from the internet. Our methodology facilitates the effective use of a generally trained VLM in a flexible and cost-efficient way and does not require additional sensing to a monocular camera. We demonstrate its adaptability across various scene types, context sources, and even setups. We evaluate our approach in two distinct scenarios: traditional First-Person View (FPV) navigation and infrastructure-driven Third-Person View (TPV) navigation, demonstrating the flexibility and simplicity of our method. Our technique significantly enhances the navigational capabilities of the baseline VLM, by approximately 50% in TPV scenario and challenging trained models in the FPV one, with as few as one demonstration per object recorded. |
---|---|
Relators: | Giuseppe Bruno Averta, Philipp Thorr, Daniele De Martini, Tim Franzmeyer |
Academic year: | 2024/25 |
Publication type: | Electronic |
Number of Pages: | 83 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Ente in cotutela: | University of Oxford (REGNO UNITO) |
Aziende collaboratrici: | University of Oxford |
URI: | http://webthesis.biblio.polito.it/id/eprint/33028 |
Modify record (reserved for operators) |