Model-Free Multi-Agent Reinforcement Learning Approach in NeurIPS LuxAI S3 Competition

Paolo Rizzo

Model-Free Multi-Agent Reinforcement Learning Approach in NeurIPS LuxAI S3 Competition.

Rel. Daniele Apiletti, Simone Monaco, Daniele Rege Cambrin. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB) | Preview

Abstract:	This thesis investigates the application of Multi-Agent Reinforcement Learning (MARL) to the development of a robust and adaptive agent able to interact with a partially observable and continuously evolving environment, while competing against other agents in order to achieve winning conditions. With the widespread adoption of deep learning, Reinforcement Learning (RL) has gained lots of popularity in the last decade, scaling to previously intractable problems, such as playing complicated games from pixel observations, sustaining conversations with humans and autonomous driving. However, there is still a wide range of domains inaccessible to RL due to the high computational cost of training or unfeasibility of agent convergence for complex problems. Therefore, the NeurIPS (Conference on Neural Information Processing Systems) LuxAI competition has become a significant event within the scientific community, serving as a platform for advancing research at the intersection of artificial intelligence, robotics, and human-robot interaction. The season 3 competition revolves around testing the limits of agents when it comes to adapting to a game with changing dynamics. In particular, the player agent competes against an opponent agent in several matches, controlling multiple sub-agents and performing a continuous trade-off between exploration of a random environment with partial observability and exploitation of the current information to maximize the target objective. The thesis first provides an overview of the main challenges of MARL paradigm, like non-stationarity, equilibrium selection, credit assignment and the scaling to many agents. Then, it follows the comparison of state-of-the-art algorithms and explanation of the architecture used. In addition, it's underlined that the model is developed as agnostic and trained with self-play, meaning that no previous knowledge is instilled and strategies are learnt indipendently, in contrast to traditional rule-based models which leverage on human heuristics to reason and take action. Finally, the thesis evaluates the performances of the model and shows the position reached in the competition ranking.
Relatori:	Daniele Apiletti, Simone Monaco, Daniele Rege Cambrin
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	84
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/35364

Modifica (riservato agli operatori)