Inverse Reinforcement Learning for Mastering Long-Horizon Procedural Tasks from Visual Demonstrations

Luca Ianniello

Inverse Reinforcement Learning for Mastering Long-Horizon Procedural Tasks from Visual Demonstrations.

Rel. Giuseppe Bruno Averta, Andrea Protopapa, Francesca Pistilli. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (6MB)

Abstract:	Robotic manipulation represents one of the most challenging domains in robotics, requiring precise coordination and adaptability to complex environments. While reinforcement learning approaches show promise, they face significant limitations in practical applications: reward engineering is prohibitively complex, exploration in high-dimensional spaces is inefficient, and physical robot training requires extensive resources. Imitation Learning (IL), particularly Inverse Reinforcement Learning (IRL), offers an alternative by learning directly from demonstrations rather than explicit reward signals. However, current IRL approaches face several fundamental challenges when applied to robotic manipulation tasks. Long-horizon manipulation tasks with multiple sequential stages are difficult to learn end-to-end due to sparse rewards and temporal complexity. Additionally, the effectiveness of different visual representation learning architectures for IRL in manipulation contexts remains under-explored, especially when combined with procedural decomposition strategies. In this thesis, we investigate how inverse reinforcement learning (IRL) can be enhanced through procedural decomposition for complex robotic manipulation tasks. We present a comprehensive evaluation of state-of-the-art visual representation learning models and IRL algorithms within the X-Magical benchmark. Our main contribution is the exploration of procedural learning strategies that decompose long-horizon tasks into manageable subtasks, enabling more efficient and effective learning. Additionally, we introduce a novel active exploration strategy that enhances the reinforcement learning process by incorporating a reward component based on the estimated distance between the current embedding frame and previous embedding frames within the same subtask, thereby promoting more effective exploration. Our results indicate that the optimal number of procedural steps is task-dependent, with balanced decomposition yielding the best outcomes. Furthermore, the proposed active exploration strategy enhances learning efficiency and task success rates across varying numbers of subtasks and random seeds.
Relatori:	Giuseppe Bruno Averta, Andrea Protopapa, Francesca Pistilli
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	88
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/37682

Modifica (riservato agli operatori)