Adaptivity of Markovian and History-Based Reinforcement Learning Policies in Environments with Latent Dynamic Parameters

Francesco Giacometti

Adaptivity of Markovian and History-Based Reinforcement Learning Policies in Environments with Latent Dynamic Parameters.

Rel. Giuseppe Bruno Averta, Gabriele Tiboni. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

Abstract

Simulated environments with latent dynamic parameters are used in robotics to train controllers with Reinforcement Learning by sampling the parameters at the start of each episode, a technique known as Domain Randomization. This has proven to enable a smoother transfer of the trained controllers from a simulated environment to a real environment, compared to controllers trained in non-randomized environments. In the Domain Randomization literature, we found an established connection between adaptivity to the environment and history-based policies, whereas Markovian policies are assumed not to be adaptive and are often qualified as robust. Adaptivity in this context refers to the ability to infer the value of the dynamic parameters and deploy an optimal strategy for the inferred value.

While it is true that in environments with latent parameters the optimal policy is not guaranteed to be Markovian, because the environment is not Markovian relative to the observed state, we challenge the notion that Markovian policies cannot show adaptive behavior

Relatori

Giuseppe Bruno Averta, Gabriele Tiboni

Anno Accademico

2024/25

Tipo di pubblicazione

Elettronica

Numero di pagine

Informazioni aggiuntive

Tesi secretata. Fulltext non presente

Corso di laurea

Corso di laurea magistrale in Data Science And Engineering

Classe di laurea

Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA

URI

https://webthesis.biblio.polito.it/id/eprint/35267

Modifica (riservato agli operatori)