Politecnico di Torino (logo)

Deep Reinforcement Learning and Ultra-Wideband for autonomous navigation in service robotic applications

Enrico Sutera

Deep Reinforcement Learning and Ultra-Wideband for autonomous navigation in service robotic applications.

Rel. Marcello Chiaberge. Politecnico di Torino, Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica), 2019

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (7MB) | Preview

Autonomous navigation for service robotics is one the greatest challenges and there's a huge effort from scientific community. This work is born at PIC4SeR (PoliTo Interdepartmental Centre for Service Robotics) with the idea of facing the aforementioned challenge merging rediscovered and promising technologies and techniques: Deep Reinforcement Learning and Ultra-Wideband technology. Over few past years the world has seen a huge advance in the field of Artificial Intelligence, especially thanks to Machine Learning techniques. The latter include a branch called Deep Reinforcement Learning (DRL) that involves the training of Artificial Neural Network (ANN) from experience, i.e. without the need of huge datasets. Here DRL has been used to train an agent able to perform goal reaching and obstacle avoidance. Ultra-wideband (UWB) is an emerging technology that can be used for short-range data transmission and localization. It can be used in GPS-denied environments, such as indoor ones. In this work UWB has been used for localization purposes. UWB is supposed to be a key technology in future: many giant companies are involved and Apple has already inserted an UWB chip in its latest product. It has been used a differential drive robot as implementation platform. The robot is controlled by an ANN (which has robot pose information, lidar information and goal information as input and linear and angular speeds as outputs) using ROS (Robot Operating System). The ANN is trained using a DRL algorithm called Deep Deterministic Policy Gradient (DDPG) in a simulated environment. The UWB has been used in testing phase only. The overall system has been tested in a real environment and compared with human performances, showing that it is able - in some tasks - to match or even outdo them. There have been satisfying results and it is believed that, although there are strong limitations given by the difficulty of the challenge, the system complies with expectations and constitutes a good baseline for future work.

Relators: Marcello Chiaberge
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 94
Corso di laurea: Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica)
Classe di laurea: New organization > Master science > LM-25 - AUTOMATION ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/13162
Modify record (reserved for operators) Modify record (reserved for operators)