Safe Exploration with Safety Layer and reward shaping

Alessia Basler

Safe Exploration with Safety Layer and reward shaping.

Rel. Manuela Battipede. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Aerospaziale, 2021

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB) | Preview

Abstract

The purpose of this Master Thesis is to investigate and improve one of the state-of-the-art Safe Reinforcement Learning algorithms. The studied algorithm consists in the application of a Safety Layer to classical Reinforcement Learning algorithms in order to accomplish a Safe Exploration during learning phases, that would open up the doors of real-world training to intelligent agents. Safety Layer algorithm shows good performances in environments where the danger is located on the edges, but worsens when used in environments where the hazards permeate the space in an heterogenous way. To improve the performances in such peculiar situations, reward shaping has been introduced, in order to reinforce the safety action of Safety Layer.

In the first chapters of the thesis an introduction to Artificial Intelligence, Deep Neural Networks, classic and Deep Reinforcement Learning will be presented