polito.it
Politecnico di Torino (logo)

Decentralized Value-Based Reinforcement Learning in Stochastic Potential Games

Hafez Ghaemi

Decentralized Value-Based Reinforcement Learning in Stochastic Potential Games.

Rel. Fabio Fagnani, Giacomo Como. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
[img] Archive (ZIP) (Documenti_allegati) - Other
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB)
Abstract:

Multi-agent reinforcement learning (MARL) is a promising paradigm for learning problems involving multiple decision makers. Contrary to centralized MARL with a central controller, decentralized (independent) MARL is more practical in terms of scalibility, privacy, and computational cost, yet more challenging due to non-stationarity of the environment from an agent’s perspective. The non-stationarity challenge arises as the evolution of the environment and the agent’s payoffs will depend on the behavior of other agents. In value-based MARL, two-timescale learning is shown to address this issue. In such a learning dynamics, agents update their value function estimates at a timescale slower than their local Q-function estimates, and therefore, the game is rendered locally stationary with respect to the strategy of other agents. However, two-timescale dynamics in decentralized Q-learning has been studied only in two-player zero-sum games. In this thesis, we focus on a newly emerged and important class of stochastic games, stochastic (Markov) potential games (SPG). We prove that a many-player extension of the two-timescale decentralized Q-learning algorithm asymptotically converges to a Nash equilibrium and evaluate the empirical performance of the algorithm on two SPG benchmarks, congestion games and distancing games.

Relators: Fabio Fagnani, Giacomo Como
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 65
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/23450
Modify record (reserved for operators) Modify record (reserved for operators)