Spacecraft Collision Avoidance: a Transformer-based Reinforcement Learning Approach

Paolo Cirrincione Paze

Spacecraft Collision Avoidance: a Transformer-based Reinforcement Learning Approach.

Rel. Manuela Battipede, Luigi Mascolo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Aerospaziale, 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (7MB) | Preview

Abstract:	The development of the space economy and the ever-growing interest toward space has led to the progressive congestion of the most commercially viable Earth orbits. More and more satellites are launched around our planet each year, increasing the risk of collisions between space objects that have the potential of creating millions of debris and an even more dangerous orbital environment. The necessity to develop collision avoidance tools and techniques has never been more pressing, as spacecrafts have to perform avoidance maneuvers with increasing frequency. In this scenario, trajectory optimization becomes of paramount importance, in order to avert collisions in the most effective way. This research proposes an implementation of a Deep Reinforcement Learning framework to optimize the path of a satellite orbiting our planet in a low Earth orbit and confronted with multiple collision warnings. The proposed approach addresses imperfect environmental modeling and measurements by using a Partially Observable Markov Decision Process. To add flexibility to the method, the states of a variable number of space debris are first processed by a Long Short-Term Memory to create a fixed-sized summary of the multiple space objects information, before being concatenated with the observation of the spacecraft state. In this way, the hidden state information is replaced with a belief vector derived from the observation time sequence (history), which is weighted by a Transformer encoder to capture the non-linear dynamics of the signals. The resulting semantic history guides an agent employing Proximal Policy Optimization, a model-free online policy estimation method, which relies on two neural networks: a critic for value estimation and an actor for policy evaluation, implemented as Multi-Layer Perceptrons. The model considers the motion of the satellite and multiple debris in LEO, under the J2 gravitational perturbation and the effect of atmospheric drag. The reward function has been designed to achieve the reduction of the collision probabilities below a critical threshold and minimum fuel consumption. A station-keeping requirement has also been introduced. Significant results obtained from the simulations are presented, highlighting the trends of the most important physical quantities and the progression of the learning of the neural networks. The research concludes by analyzing the implications of the method and its potential applications.
Relatori:	Manuela Battipede, Luigi Mascolo
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	114
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Aerospaziale
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-20 - INGEGNERIA AEROSPAZIALE E ASTRONAUTICA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/36774

Modifica (riservato agli operatori)