Decentralized Value-Based Reinforcement Learning in Stochastic Potential Games

Hafez Ghaemi

Decentralized Value-Based Reinforcement Learning in Stochastic Potential Games.

Rel. Fabio Fagnani, Giacomo Como. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB)

Abstract:	Multi-agent reinforcement learning (MARL) is a promising paradigm for learning problems involving multiple decision makers. Contrary to centralized MARL with a central controller, decentralized (independent) MARL is more practical in terms of scalibility, privacy, and computational cost, yet more challenging due to non-stationarity of the environment from an agent’s perspective. The non-stationarity challenge arises as the evolution of the environment and the agent’s payoffs will depend on the behavior of other agents. In value-based MARL, two-timescale learning is shown to address this issue. In such a learning dynamics, agents update their value function estimates at a timescale slower than their local Q-function estimates, and therefore, the game is rendered locally stationary with respect to the strategy of other agents. However, two-timescale dynamics in decentralized Q-learning has been studied only in two-player zero-sum games. In this thesis, we focus on a newly emerged and important class of stochastic games, stochastic (Markov) potential games (SPG). We prove that a many-player extension of the two-timescale decentralized Q-learning algorithm asymptotically converges to a Nash equilibrium and evaluate the empirical performance of the algorithm on two SPG benchmarks, congestion games and distancing games.
Relatori:	Fabio Fagnani, Giacomo Como
Anno accademico:	2021/22
Tipo di pubblicazione:	Elettronica
Numero di pagine:	65
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/23450

Modifica (riservato agli operatori)