polito.it
Politecnico di Torino (logo)

Cross-Embodiment Policy Learning for Robotic Manipulation

Federico Morro

Cross-Embodiment Policy Learning for Robotic Manipulation.

Rel. Giuseppe Bruno Averta, Zhenshan Bing. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (22MB) | Preview
Abstract:

The recent advancements in the machine learning field have demonstrated the potential of knowledge transfer and multi-task learning to enhance the performance and generalization capabilities of models across various domains. In the context of robotics, the ability to transfer skills between different embodiments is particularly appealing, as it can significantly reduce the time and resources required for training control agents, while also improving their adaptability and robustness. This thesis investigates how to leverage demonstrations and Reinforcement Learning (RL) to train agents capable of solving diverse manipulation tasks using multiple robotic arms and grippers. The proposed method utilizes a contrastive supervised learning approach to construct a shared representation of different robotic configurations and tasks, aligning state and action spaces across diverse embodiments while preserving the critical distinctions necessary for accurate task execution. The approach employs a language-conditioned vision-based policy, which poses significant challenges but offers greater applicability to real-world scenarios. Additionally, to construct the dataset for the vision-based agent, the thesis introduces a novel methodology that leverages demonstrations from a single embodiment to accelerate and improve the learning of state-based RL policies for diverse embodiments. This is achieved by re-rolling aligned demonstrations and incorporating an advantage-weighted behavioral cloning term into the RL training process. To assess the effectiveness of the proposed methods, extensive experiments are conducted in simulated environments using the Mujoco physics engine. The state-based RL agents exhibit accelerated learning and improved performance, demonstrating the effectiveness of the knowledge transfer pipeline. The vision-based policy achieves significant task performance across diverse embodiments and shows promising generalization capabilities to unseen robot-gripper configurations. However, when addressing more complex tasks, certain limitations become evident, particularly in the vision-based policy. In these cases, performance tends to decrease for grippers that require precise object interaction or exhibit morphologies substantially different from those encountered during training. Overall, these findings provide valuable insights into the strengths and limitations of the proposed approaches and suggest potential directions for future research.

Relatori: Giuseppe Bruno Averta, Zhenshan Bing
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 97
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: Technical University of Munich (GERMANIA)
Aziende collaboratrici: Technical University of Munich
URI: http://webthesis.biblio.polito.it/id/eprint/38674
Modifica (riservato agli operatori) Modifica (riservato agli operatori)