Few Shot Adaptation of VLM for Panoptic Segmentation

Masoud Karimi

Few Shot Adaptation of VLM for Panoptic Segmentation.

Rel. Andrea Bottino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (9MB) | Preview

Abstract

This project aims to refine the SEEM (Segment Everything Everywhere All At Once Model) foundational model through advanced fine-tuning techniques and thorough dataset preparation. The dataset preparation process includes creating annotated ground truth images for panoptic learning and generating grounding an- notation files. These annotations encompass segmentation masks and bounding boxes, with each category uniquely colored. Grounding annotations link segmenta- tion details with descriptive sentences, establishing connections between segmenta- tions and their context. Recently, transformers have demonstrated significant suc- cess in various computer vision domains due to their dynamic modeling capabilities and long-range dependencies. Vision transformers have outperformed CNN models in tasks like object detection and semantic segmentation.

To incorporate domain- specific knowledge into SEEM, we focused on fine-tuning the vision backbone, the segmentation head, or both, using adapters to preserve the model’s original parame- ters

Relatori

Andrea Bottino

Anno Accademico

2023/24

Tipo di pubblicazione

Elettronica

Numero di pagine

Corso di laurea

Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)

Classe di laurea

Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA

URI

https://webthesis.biblio.polito.it/id/eprint/31780

Modifica (riservato agli operatori)