Constrained Reinforcement Learning for Safe Quadruped Robot Locomotion

Paolo Magliano

Constrained Reinforcement Learning for Safe Quadruped Robot Locomotion.

Rel. Raffaello Camoriano. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Abstract:	In recent years, robotic locomotion has made considerable advances, allowing legged robots to walk successfully across a wide range of terrains. Many of the most successful approaches utilize Reinforcement Learning (RL) as a framework that allows robots to learn useful behaviors to achieve a goal through a trial-and-error approach, specifying only the desired objective. However, while RL has demonstrated impressive capabilities, it also presents some limitations. One of the main concerns is the lack of safety during both the learning process and the implementation of the resulting policy. In real-world scenarios, unsafe behavior can damage the robot, harm the surrounding area, or even become a risk to humans. So, ensuring safety is a fundamental requirement for deploying RL-based strategies on physical robots. Safe Reinforcement Learning tries to address this issue by introducing constraints that encourage safer and more controlled behavior, not only after training but ideally during the learning process while maintaining the advantages of traditional RL. Furthermore, most current research focuses on training in simulation and then transferring the policy to the physical robot using Sim-to-Real techniques. However, reaching a sufficient level of safety during learning opens up the possibility of training directly on real robots, avoiding the potential mismatches caused by simulation-to-reality transfer. The thesis explores state-of-the-art Safe RL methods for quadruped locomotion as a way to enforce safety constraints into the learning process. Specifically, it examines the ATACOM method, which transforms a constrained RL problem into an unconstrained one where the action space is mapped onto a manifold to limit possible actions according to the desired safety requirements. These constraints are derived from the robot model using the kinematic and geometric properties provided by the Lie theory. The study investigates the effectiveness of several safety restrictions and analyzes their impact on task learning and locomotion performance. The goal is to determine which constraints support learning, identify the boundary conditions that still allow successful walking, and observe how the walking style adapts under different configurations. The considered constraints are joint position limits, minimum and maximum height of the robot base, and desired feet position and orientation. The thesis compares the results of the constrained learning approach enabled by ATACOM with state-of-the-art unconstrained locomotion policies, evaluating trade-offs in terms of performance, safety, and robustness. It also extends the ATACOM method by improving its effectiveness in the task by redesigning the way the action space is morphed to ensure safety and by correcting the unsafe actions with an improved error correction approach.
Relatori:	Raffaello Camoriano
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	80
Informazioni aggiuntive:	Tesi secretata. Full text non presente
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela:	Intelligent Autonomous Systems Group, Computer Science Department, Technische UniversitÃ¿Â¤t Darmstadt (GERMANIA)
Aziende collaboratrici:	Technische Universität Darmstadt
URI:	http://webthesis.biblio.polito.it/id/eprint/36416

Modifica (riservato agli operatori)