Vision Graph Neural Networks for Remote Sensing

Giovanni Sciortino

Vision Graph Neural Networks for Remote Sensing.

Rel. Paolo Garza, Luca Colomba. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (15MB) | Preview

Abstract:	Modern computer vision approaches mainly relied on convolutions neural networks, which view the images as regular grid structures. More recently, different approaches have been proposed to overcome the limitations, such as the lack of flexibility, and enhance the receptive fields of neural network architectures. To address these limitations, graph-based neural networks have garnered increasing interest for computer vision tasks. Instead of a grid, these methods represent images as graphs that encapsulate relationships between spatial regions. In the graph, nodes correspond to image patches or regions, while edges characterize the spatial and semantic connections between them. Consequently, this representation provides a more adaptable way of encoding both local and long-range dependencies within the visual scene. In this thesis, we investigate the application of Vision Graph Neural Network (ViG) architecture for multi-label land cover classification. Moreover, we evaluate different variations of ViG architecture, analyzing the effectiveness of different message passing layers compared to the original formulation. We utilize the large-scale BigEarthNet Sentinel-2 multispectral dataset, one of the largest existing remote sensing archives. Given the pyramidal architecture of ViG, we examine the performance of three graph convolutional layers: max-relative, GCN, and graph attention (GAT) convolution. We further compare the model with and without relative positional encoding and using all 12 Sentinel-2 spectral bands versus only the red-green-blue (RGB) bands. Experimental results demonstrate that Pyramid ViG provides superior performances over architectures like ResNet-101 in terms of precision, recall, and F1 score. Among the graph layers, max-relative convolution (i.e., the original formulation of ViG) performs best, and relative positional encoding improves predictions across all analyzed settings.
Relatori:	Paolo Garza, Luca Colomba
Anno accademico:	2023/24
Tipo di pubblicazione:	Elettronica
Numero di pagine:	70
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/29358

Modifica (riservato agli operatori)