Giovanni Sciortino
Vision Graph Neural Networks for Remote Sensing.
Rel. Paolo Garza, Luca Colomba. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (15MB) | Preview |
Abstract: |
Modern computer vision approaches mainly relied on convolutions neural networks, which view the images as regular grid structures. More recently, different approaches have been proposed to overcome the limitations, such as the lack of flexibility, and enhance the receptive fields of neural network architectures. To address these limitations, graph-based neural networks have garnered increasing interest for computer vision tasks. Instead of a grid, these methods represent images as graphs that encapsulate relationships between spatial regions. In the graph, nodes correspond to image patches or regions, while edges characterize the spatial and semantic connections between them. Consequently, this representation provides a more adaptable way of encoding both local and long-range dependencies within the visual scene. In this thesis, we investigate the application of Vision Graph Neural Network (ViG) architecture for multi-label land cover classification. Moreover, we evaluate different variations of ViG architecture, analyzing the effectiveness of different message passing layers compared to the original formulation. We utilize the large-scale BigEarthNet Sentinel-2 multispectral dataset, one of the largest existing remote sensing archives. Given the pyramidal architecture of ViG, we examine the performance of three graph convolutional layers: max-relative, GCN, and graph attention (GAT) convolution. We further compare the model with and without relative positional encoding and using all 12 Sentinel-2 spectral bands versus only the red-green-blue (RGB) bands. Experimental results demonstrate that Pyramid ViG provides superior performances over architectures like ResNet-101 in terms of precision, recall, and F1 score. Among the graph layers, max-relative convolution (i.e., the original formulation of ViG) performs best, and relative positional encoding improves predictions across all analyzed settings. |
---|---|
Relators: | Paolo Garza, Luca Colomba |
Academic year: | 2023/24 |
Publication type: | Electronic |
Number of Pages: | 70 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/29358 |
Modify record (reserved for operators) |