Real-World Fine-Tuning of Diffusion Policies for Autonomous Exploration Using Reinforcement Learning and Human Demonstrations

Alessandro De Marco

Real-World Fine-Tuning of Diffusion Policies for Autonomous Exploration Using Reinforcement Learning and Human Demonstrations.

Rel. Raffaello Camoriano, Luca Benini, Michele Magno. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Abstract

Autonomous exploration is a fundamental challenge in robotics, with broad implications for operations in remote or hazardous environments. Diffusion policies, generative models that can predict robot actions, have emerged as powerful tools for navigation. However, these models are typically trained with imitation learning (IL) and often fail to generalize beyond their demonstrations. Furthermore, fine-tuning diffusion policies with reinforcement learning (RL) is challenging, as backpropagating through the denoising chain is non trivial, and sample collection in the real world is costly. This thesis addresses such challenges by adapting Q-weighted Variational Policy Optimization (QVPO) to fine-tune Navigation with Goal Masked Diffusion (NoMaD), a state-of-the-art diffusion-based navigation model that unifies goal-conditioned navigation and task-agnostic exploration through goal masking, predicting multimodal action sequences directly from past RGB frames.

The fine-tuning is guided by an external critic that evaluates the sampled trajectories and reweights the diffusion loss according to their Q-values, enabling RL-based fine-tuning without traversing the denoising process