Andrea Delli
Environment and Embodiment adaptation of Vision-Language-Action models for robotic manipulation.
Rel. Giuseppe Bruno Averta, Davide Buoso, Francesca Pistilli. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
|
PDF (Tesi_di_laurea)
- Tesi
Accesso limitato a: Solo utenti staff fino al 12 Giugno 2027 (data di embargo). Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (19MB) |
Abstract
Vision-Language-Action (VLA) models represent a recent and promising direction in robotics, enabling agents to understand natural language instructions, perceive complex visual scenes, and perform manipulation tasks. However, these models often struggle to generalize across different robotic embodiments and environments, as changes in camera viewpoints, kinematics, or action spaces introduce significant distribution shifts. This thesis investigates the problem of robotic embodiment adaptation by evaluating the performance and adaptability of existing pre-trained VLA models on diverse robotic setups. The study focuses on fine-tuning and assessing multiple state-of-the-art VLA architectures: Diffusion Policy, OpenVLA, OpenVLA-OFT, SmolVLA, GR00T, and π0 using imitation learning. Data were collected primarily in simulation with the RLBench environment, which provides standardized tasks for the 7-DoF Franka Panda arm, and further validated on a 6-DoF real-world manipulator developed by the DIANA student team.
In total, approximately 500 simulated episodes and 50 real demonstrations were gathered
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
URI
![]() |
Modifica (riservato agli operatori) |
