Decentralized Value-Based Reinforcement Learning in Stochastic Potential Games

Hafez Ghaemi

Decentralized Value-Based Reinforcement Learning in Stochastic Potential Games.

Rel. Fabio Fagnani, Giacomo Como. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview	PDF (Tesi_di_laurea) - Tesi Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) \| Preview
	Archive (ZIP) (Documenti_allegati) - Altro Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB)

Abstract

Multi-agent reinforcement learning (MARL) is a promising paradigm for learning problems involving multiple decision makers. Contrary to centralized MARL with a central controller, decentralized (independent) MARL is more practical in terms of scalibility, privacy, and computational cost, yet more challenging due to non-stationarity of the environment from an agent’s perspective. The non-stationarity challenge arises as the evolution of the environment and the agent’s payoffs will depend on the behavior of other agents. In value-based MARL, two-timescale learning is shown to address this issue. In such a learning dynamics, agents update their value function estimates at a timescale slower than their local Q-function estimates, and therefore, the game is rendered locally stationary with respect to the strategy of other agents.

However, two-timescale dynamics in decentralized Q-learning has been studied only in two-player zero-sum games