Francesco Antonio Novia
Vision-Language-Action models for industrial robotics.
Rel. Alessandro Rizzo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2026
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (20MB) | Preview |
|
|
Archive (ZIP) (Documenti_allegati)
- Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (33MB) |
Abstract
Recent developments about Vision-Language-Action (VLA) models are representing a remarkably innovative approach in the field of robotics. This class of AI models promises to play noticeable role in robotics research, analogously to the deep innovation brought by foundation models for a large part of modern AI technologies, in several contexts. Leveraging multi-modal input understanding, LLMs generative capabilities, and an effective translation layer to perform real world actions, VLAs aim to embed the concept of a unified physical intelligence, which can then easily apply to different unseen embodiments and can increase modern autonomous robotic systems' versatility and robustness. Being able to adapt and fine-tune these models for various types of tasks and environments can potentially enhance the capabilities of common robotic systems, such as industrial manipulators, as well as facilitate users to control them with a more natural approach.
The aim of this thesis is to explore and integrate existing SoTA VLA models for industrial applications, specifically involving context-aware manipulation and pick-and-place use cases
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
