Hafez Ghaemi
Decentralized Value-Based Reinforcement Learning in Stochastic Potential Games.
Rel. Fabio Fagnani, Giacomo Como. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
|
Archive (ZIP) (Documenti_allegati)
- Other
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) |
Abstract: |
Multi-agent reinforcement learning (MARL) is a promising paradigm for learning problems involving multiple decision makers. Contrary to centralized MARL with a central controller, decentralized (independent) MARL is more practical in terms of scalibility, privacy, and computational cost, yet more challenging due to non-stationarity of the environment from an agent’s perspective. The non-stationarity challenge arises as the evolution of the environment and the agent’s payoffs will depend on the behavior of other agents. In value-based MARL, two-timescale learning is shown to address this issue. In such a learning dynamics, agents update their value function estimates at a timescale slower than their local Q-function estimates, and therefore, the game is rendered locally stationary with respect to the strategy of other agents. However, two-timescale dynamics in decentralized Q-learning has been studied only in two-player zero-sum games. In this thesis, we focus on a newly emerged and important class of stochastic games, stochastic (Markov) potential games (SPG). We prove that a many-player extension of the two-timescale decentralized Q-learning algorithm asymptotically converges to a Nash equilibrium and evaluate the empirical performance of the algorithm on two SPG benchmarks, congestion games and distancing games. |
---|---|
Relators: | Fabio Fagnani, Giacomo Como |
Academic year: | 2021/22 |
Publication type: | Electronic |
Number of Pages: | 65 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/23450 |
Modify record (reserved for operators) |