Explainable Reinforcement Learning for Risk Mitigation in a human-robot collaboration scenario

Alessandro Iucci

Explainable Reinforcement Learning for Risk Mitigation in a human-robot collaboration scenario.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (6MB) | Preview

Abstract:	Reinforcement Learning (RL) algorithms are highly popular in the robotics field because the can solve complex control problems, learn from dynamic environments and generate the optimal outcome. Explainability for all Machine Learning (ML)-based algorithms including RL is gaining importance because of the increasing complexity of the models, which makes them more accurate but at the same time less transparent. The need for explainability increases even more in Human-Robot Collaboration (HRC) scenarios where safety is an important aspect to be guaranteed. This work focuses on the application of two explainability techniques, “Reward Decomposition” and “Autonomous Policy Explanation”, on a RL algorithm which is the core of a risk mitigation module for robots’ operation in an automated warehouse scenario, a HRC environment where human and robots work together without harming one another. The first technique used is “Reward decomposition” which gives an insight on the factors that impacted the robot’s choice by decomposing the reward function into sub-functions, each considering a specific aspect of the robot’s state, and using a graphical type of explanation. It also allows to create Minimal Sufficient Explanation (MSX), sets of relevant reasons for each decision taken during the robot’s operation. On the other hand, the second applied technique, “Autonomous Policy Explanation”, provides a global overview of the robot’s behaviour, letting it answer either targeted or general queries asked by human users, but also providing insight on the decision guidelines embedded in the policy learnt by the robot. Since the synthesis of the policy descriptions and the queries’ answers are in natural language, this tool facilitates algorithm diagnosis even by non-expert users. The work provides an analysis of the results of the application of both techniques which both led to increased transparency of the robot’s decision process, building trust in its choices which proved to be among the optimal ones in most of the cases, but also made possible to find weaknesses in the robot’s policy helpful for debugging purposes.
Relatori:	Paolo Garza
Anno accademico:	2020/21
Tipo di pubblicazione:	Elettronica
Numero di pagine:	85
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela:	KTH - Kungl. Tekniska Hogskolan (Royal Institute of Technology) (SVEZIA)
Aziende collaboratrici:	ERICSSON
URI:	http://webthesis.biblio.polito.it/id/eprint/18126

Modifica (riservato agli operatori)