Deep Reinforcement Learning for Robotic Manipulation on UR10e: From Simulation to Real Deployment

Federico Pretini

Deep Reinforcement Learning for Robotic Manipulation on UR10e: From Simulation to Real Deployment.

Rel. Giuseppe Bruno Averta. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution.
Download (18MB) | Preview

Abstract:	Robotic manipulation is widely adopted in industry but typically assumes predictable, tightly structured workspaces, where classical motion-planning pipelines can work safely, reducing as much as possible the occurrence of failures when used to control robots. However, when a failure occurs e.g. missed grasps, object slippage, or unforeseen perturbations, explicit detection and handling are required, making this approach fragile in unstructured settings. In parallel, advances in hardware such as graphics processing units (GPUs) and in machine learning have made Reinforcement Learning (RL) a practical option to learn, via trial and error, policies that directly plan and control motion while exhibiting recovery behaviors. In fact, unlike classical pipelines, an RL policy acts as a closed-loop controller that adapts online to perturbations and unexpected events without the need to exhaustively hard-coding failure cases. This thesis presents an end-to-end pipeline for training, simulation-based validation, and deployment to a real robot of manipulation policies for a UR10e with a Robotiq 2F-140 gripper. In simulation, training is performed in Isaac Lab/Sim using the Proximal Policy Optimization (PPO) algorithm implemented in the RSL-RL library on three tasks of increasing complexity (Reach, Lift, OpenDrawer). The approach leverages manager-based environments for reusability and portability, and domain randomization to improve policy robustness. Intermediate validation uses a containerized digital environment based on URSim, orchestrated with Docker and ROS 2, which mirrors the controller-level stack and enables deterministic policy replay without hardware. Finally, deployment to the physical robot reuses the same code used for simulation validation, with minor adjustments for gripper integration, on a host running a real-time Linux kernel to improve timing determinism. The results demonstrate successful transfer of the reach and lift task policies, with effective execution of the tasks on the real robot. The open-drawer tasks show promising initial results, with successful policy execution in simulation indicating potential for further development. Overall, this work beypond demonstrating a zero-shot RL pipeline from simulation to reality, highlight that sequential reward terms can interact and conflict as task complexity increases. Future work will therefore explore Imitation Learning and Inverse Reinforcement Learnin to reduce manual reward design and assess UR's new direct torque control interface for torque-level, contact-aware control.
Relatori:	Giuseppe Bruno Averta
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	109
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	SANTER Reply S.p.a.
URI:	http://webthesis.biblio.polito.it/id/eprint/38675

Modifica (riservato agli operatori)