polito.it
Politecnico di Torino (logo)

Safe Exploration with Safety Layer and reward shaping

Alessia Basler

Safe Exploration with Safety Layer and reward shaping.

Rel. Manuela Battipede. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Aerospaziale, 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

The purpose of this Master Thesis is to investigate and improve one of the state-of-the-art Safe Reinforcement Learning algorithms. The studied algorithm consists in the application of a Safety Layer to classical Reinforcement Learning algorithms in order to accomplish a Safe Exploration during learning phases, that would open up the doors of real-world training to intelligent agents. Safety Layer algorithm shows good performances in environments where the danger is located on the edges, but worsens when used in environments where the hazards permeate the space in an heterogenous way. To improve the performances in such peculiar situations, reward shaping has been introduced, in order to reinforce the safety action of Safety Layer. In the first chapters of the thesis an introduction to Artificial Intelligence, Deep Neural Networks, classic and Deep Reinforcement Learning will be presented. This aims to make the reader familiar with the main topics and algorithms that will be probed and deeply analyzed later in this document. A chapter is dedicated to the explanation of Safe Reinforcement Learning process, in what it differs from classic Reinforcement Learning, its goals and main challenges. The last chapters will treat the implementation of the chosen algorithm and experiments results, with an eye towards the issues encountered and the solutions proposed. In these chapters it will be also inserted a presentation of the technical features inherent to the experiments performed. Eventually, some conclusions will be deduced about the improvements obtained, showing that reinforcing the Safety Layer action with reward shaping helps to achieve Safe Exploration in environments with heterogenous danger distributions, that are more plausible representations of real-world.

Relators: Manuela Battipede
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 91
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Aerospaziale
Classe di laurea: New organization > Master science > LM-20 - AEROSPATIAL AND ASTRONAUTIC ENGINEERING
Aziende collaboratrici: ADDFOR S.p.A
URI: http://webthesis.biblio.polito.it/id/eprint/18312
Modify record (reserved for operators) Modify record (reserved for operators)