polito.it
Politecnico di Torino (logo)

Autonomous Lunar Lander, Deep Reinforcement Learning for Control application

Matteo Stoisa

Autonomous Lunar Lander, Deep Reinforcement Learning for Control application.

Rel. Elena Maria Baralis, Lorenzo Feruglio, Mattia Varile, Luca Romanelli. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

Abstract:

The thesis aims to analyze the application of a Deep Reinforcement Learning algorithm to a case study in the field of control. The algorithm chosen is Proximal Policy Optimization, which in recent years has reached the state of the art in various fields of application; the control problem taken into consideration is the powered descent phase of a lander and the subsequent landing in a predetermined area, the Apollo 11 mission was taken as a guideline for some aspects in the creation of the model. The Unity framework was used for the modeling and simulation part, the ML-Agents library for the management of the DRL part. The implementation of the fidelity of the model was taken with an incremental approach, which allowed to gradually understand and deal with the critical issues given that as the physical complexity of the problem increases, the difficulty in achieving the desired result increases considerably. The main problems faced and analyzed deal with the interaction between the physical model, the reward function through which the agent learns and the numerous parameters that manage the PPO algorithm. The main features implemented in the first three realized scenarios are the realistic and random initial conditions, the realistic landing constraints, the limited fuel and the loss of mass caused by its consumption. In these three scenarios the main simplification is the constraint of movement in only three degrees of freedom; the agents obtained are capable of reaching the predetermined landing with a percentage greater than 90%. In the fourth and final scenario, the lander has the possibility to move in six degrees of freedom, here the best result is about 75% accuracy, but further analyzes carried out on the failure cases have shown that they can still be considered positive. In addition to the analysis of the factors that have made it possible to reach these goals or not, three training strategies have been theorized and implemented, i.e. manipulations that deviate from the normal training mechanism in order to reduce the consumption of resources; they have proved effective in some cases, but not in the most complex scenario. Overall, the results achieved are considered satisfactory, they can represent a guideline to implement this methodology in other similar applications, or to continue the development of this case study.

Relatori: Elena Maria Baralis, Lorenzo Feruglio, Mattia Varile, Luca Romanelli
Anno accademico: 2021/22
Tipo di pubblicazione: Elettronica
Numero di pagine: 160
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: AIKO S.R.L.
URI: http://webthesis.biblio.polito.it/id/eprint/21222
Modifica (riservato agli operatori) Modifica (riservato agli operatori)