Politecnico di Torino (logo)

Reinforcement Learning Based Strategic Exploration Algorithm for UAVs Fleets

Cosimo Bromo

Reinforcement Learning Based Strategic Exploration Algorithm for UAVs Fleets.

Rel. Giorgio Guglieri, Simone Godio. Politecnico di Torino, Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (11MB) | Preview

Nowadays, autonomous navigation systems are becoming increasingly pervasive in everyday life and work. Unmanned Aircraft Systems (UASs) have been developed in the recent years, conquering different market segments and gaining popularity for their versatility and usefulness. Besides the economic benefits arising from their employments, ranging from crops monitoring in agriculture to fast parcel deliveries, their greatest incentive is the feasibility in hazardous and high risk operations. Their rapid growth is mainly associated to the quick development of algorithms and strategies for autonomous navigation and task execution, involving both traditional approaches and applications of artificial intelligence (AI) algorithms. One of the most challenging but rewarding field of study is the coordinated behavior of a number agents, collaborating to carry out the same high level task as well as distinguished low level objectives. Employment of fleets of Unmanned Aerial Vehicles (UAVs) may be particularly fruitful especially for time-sensitive operations, in which battery autonomy and time minimization are the most stringent requirements. Several applications and needs may exploit all such potentialities, provided that efficient collaboration models and strategies are implemented. In this thesis, a Reinforcement Learning (RL) approach for coverage planning is presented. The main aim is to efficiently map an environment using a fleet composed of a certain number of UAVs, ranging from 2 to 10, while recognizing and avoiding obstacles. This objective envisages several difficulties, notably those related to collaborative behavior. In fact, each fleet component should autonomously move while taking into account both unexplored areas and other drones positions, in order to avoid mutual collisions and inefficient spreading in the environment. UAVs are trained to accomplish these tasks in a shared environment, by means of Proximal Policy Optimization (PPO) algorithm, a policy gradient method making use of Convolutional Neural Networks (CNNs) for policy and value function approximation. Training procedure is performed through a novel and modified version of PPO, which exploits all agents' trajectories to concurrently update a shared policy function, subsequently tested in a decentralized fashion with a variable number of UAVs. Trained fleets' performance is then assessed in terms of energy consumption, distribution statistics and coverage task accomplishment in simulated test environments.

Relators: Giorgio Guglieri, Simone Godio
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 106
Corso di laurea: Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica)
Classe di laurea: New organization > Master science > LM-25 - AUTOMATION ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/23488
Modify record (reserved for operators) Modify record (reserved for operators)