polito.it
Politecnico di Torino (logo)

A Deep Reinforcement Learning Framework for Autonomous and Time-Critical Collision Avoidance in Low Earth Orbit

Alberto Preti

A Deep Reinforcement Learning Framework for Autonomous and Time-Critical Collision Avoidance in Low Earth Orbit.

Rel. Paolo Maggiore, Davide Conte. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Aerospaziale, 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (15MB) | Preview
Abstract:

The increase in the population of objects in orbit, together with the exponential growth of mega constellations, is significantly increasing the risk of collisions in Low Earth Orbit (LEO). This trend makes collision avoidance crucial to prevent the generation of new debris which, triggering a chain reaction, could lead to what the Kessler syndrome predicted, compromising future access to Earth orbit. Traditional approaches to collision avoidance maneuver planning based on ground operations are not scalable to the expected future traffic volumes and are ineffective in time-critical scenarios, where the time window available to plan and implement a maneuver can be reduced to just a few hours before the time of closest approach (TCA). Due to these challenges, this thesis proposes a framework based on Deep Reinforcement Learning for the autonomous planning of time-critical collision avoidance maneuvers in LEO, characterized by decision windows as short as 1-2 hours before TCA. The problem is formalized as a fully observable Markov decision process, in which an agent interacts with a simulated environment through the application of instantaneous Δv impulses. The action chosen by the agent is evaluated through a reward function designed to minimize the probability of collision, keep the satellite within its operational orbit, and optimize the consumption of the available propulsion budget. The simulated environment with which the agent interacts is a customized orbital propagator that allows, in addition to an unperturbed two-body model, the selection of perturbations relevant to the scenario under consideration, such as atmospheric drag, Earth’s gravitational harmonics, solar radiation pressure, and third-body gravitational perturbations from the Sun and Moon. To learn an optimal maneuvering strategy, the agent training is conducted using the Proximal Policy Optimization algorithm, based on an actor-critic architecture consisting of two multilayer perceptron neural networks. Moreover, to ensure generalization capability, a database of collision scenarios was generated to expose the agent to different collision geometries with space debris during the training phase. The evaluation of the learned optimal policy demonstrated the effectiveness of the proposed framework. The agent successfully planned avoidance maneuvers in scenarios never encountered during training with a total Δv on the order of 2 m/s ensuring that the satellite remained within its nominal operational orbit. In addition, a post-maneuvers analysis through the orbital propagation of the satellite, including most of the perturbations, verifies the absence of recurrent close approaches with the same debris. The proposed framework is therefore fully compatible with time-critical scenarios, since the time required for maneuver planning corresponds to the inference time of the policy neural network, which is on the order of a few seconds.

Relatori: Paolo Maggiore, Davide Conte
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 130
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Aerospaziale
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-20 - INGEGNERIA AEROSPAZIALE E ASTRONAUTICA
Aziende collaboratrici: Embry Riddle University
URI: http://webthesis.biblio.polito.it/id/eprint/38569
Modifica (riservato agli operatori) Modifica (riservato agli operatori)