polito.it
Politecnico di Torino (logo)

SAM Meets FC-CLIP: Advancing Open Vocabulary Segmentation in Satellite Imagery

Jacopo Lungo Vaschetti

SAM Meets FC-CLIP: Advancing Open Vocabulary Segmentation in Satellite Imagery.

Rel. Paolo Garza, Edoardo Arnaudo. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (41MB) | Preview
Abstract:

Our work addresses the challenge of open vocabulary semantic segmentation for very high-resolution satellite imagery. This computer vision task goes beyond traditional semantic segmentation, which assigns predefined category labels to each image pixel. Instead, the open vocabulary approach enables the dynamic identification of any object or region through natural language queries, eliminating the constraints of fixed classification categories. This flexible approach represents a critical advancement in remote sensing applications, given the highly diverse scenes captured in satellite observations. We propose two novel solutions that build upon and enhance FC-CLIP, a state-of-the-art open vocabulary model originally designed for natural images. Our first solution, Remote FC-CLIP, integrates a remote sensing-specific CLIP model (Remote CLIP) into the baseline model's architecture, followed by fine-tuning on the OpenEarthMap (OEM) dataset. The second approach, SAM-FC-CLIP, combines a Segment Anything Model for mask extraction with modified classification components from FC-CLIP. This model was trained on a custom-built dataset that combines OEM and iSAID datasets, demonstrating an effective approach to tackle the persistent scarcity of comprehensive training data in the remote sensing domain. Results demonstrate that Remote FC-CLIP achieves superior performance compared to the baselines. While it excels on classes present in the training set, it exhibits reduced generalization to novel categories. In contrast, our SAM-based solution demonstrates remarkable open vocabulary capabilities, surpassing both baseline models and Remote FC-CLIP in identifying previously unseen classes. Despite the challenges posed by the scarcity of comprehensive satellite imagery datasets, these findings represent a step forward within this emerging field while also revealing promising directions for future research.

Relatori: Paolo Garza, Edoardo Arnaudo
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 80
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: FONDAZIONE LINKS
URI: http://webthesis.biblio.polito.it/id/eprint/34028
Modifica (riservato agli operatori) Modifica (riservato agli operatori)