Politecnico di Torino (logo)

Study on reinforcement-learning-based decision-making and planning in the context of non-deterministic scenarios

Alessandro Franco

Study on reinforcement-learning-based decision-making and planning in the context of non-deterministic scenarios.

Rel. Giovanni Squillero, Alberto Paolo Tonda. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview

Making a choice is a complex task, involving many other processes that can heavily influence the final decision: experience plays a fundamental role in order to determine the consequences based on previous similar situations, as well as to analyse the context in which the decision is being made. In real life, even the most well thought plan however does not only take into account the various actors involved and the situation, but it must consider a certain degree of randomness at play that may critically disrupt the initial plan and requires the actor to adapt to new scenarios and make new choices to complete a task. In the context of games this is a common scenario in which players find themselves in, from developing a strategy to actually enact it there are many decisions to be made during the course of a full game, meaning that players should approach the task with a consistent strategy but still be able to improvise in case of unexpected scenarios if they are to win. This project aims at developing a reinforcement-learning-based algorithm able to learn the basics of a given game, understand its mechanics and with this knowledge develop an efficient strategy with the provided means. Moreover the code will be tested in various different scenarios to verify how it handles unexpected or unprecedented game states and how said instances may affect future strategies. Finally the algorithm is going to be tested in a non deterministic environment to add the element of randomness to the play, forcing the code to take into account random variables when planning moves during the game. Pokémon is a turn-based simultaneous video-game in which players battle each other with a set team (of up to 6 Pokémon) until all members of the opposing team are defeated or the opponent concedes. Moreover the game involves various aspects of randomness, from imperfect information (due to the fact that teams are unknown until the game starts and to how different players build teams to suit different strategies, possibly involving the same team composition), to non deterministic behavior (linked to how moves are performed and to possible side effects linked to different combinations). VGC in particular is the rule set used in international competitions that restricts players to select 4 out of the 6 available members and matches are fought in a 2v2 format. Although some aspects of the game are random and cannot always be controlled by the player, the results showed that, with enough training, the code was able to take into account the risks of certain moves when devising a strategy as well as trying to counter the opponent's. Considering the adaptability of reinforcement learning algorithms, even instances of the code which did not undergo training were able to deliver interesting results, proving that this approach is viable for further experimentation and improvements.

Relators: Giovanni Squillero, Alberto Paolo Tonda
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 47
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/26843
Modify record (reserved for operators) Modify record (reserved for operators)