Francesco Giacometti
Adaptivity of Markovian and History-Based Reinforcement Learning Policies in Environments with Latent Dynamic Parameters.
Rel. Giuseppe Bruno Averta, Gabriele Tiboni. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025
Abstract
Simulated environments with latent dynamic parameters are used in robotics to train controllers with Reinforcement Learning by sampling the parameters at the start of each episode, a technique known as Domain Randomization. This has proven to enable a smoother transfer of the trained controllers from a simulated environment to a real environment, compared to controllers trained in non-randomized environments. In the Domain Randomization literature, we found an established connection between adaptivity to the environment and history-based policies, whereas Markovian policies are assumed not to be adaptive and are often qualified as robust. Adaptivity in this context refers to the ability to infer the value of the dynamic parameters and deploy an optimal strategy for the inferred value.
While it is true that in environments with latent parameters the optimal policy is not guaranteed to be Markovian, because the environment is not Markovian relative to the observed state, we challenge the notion that Markovian policies cannot show adaptive behavior
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Informazioni aggiuntive
Corso di laurea
Classe di laurea
URI
![]() |
Modifica (riservato agli operatori) |
