Few Shot Adaptation of VLM for Panoptic Segmentation
Masoud Karimi
Few Shot Adaptation of VLM for Panoptic Segmentation.
Rel. Andrea Bottino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (9MB) | Preview |
Abstract
This project aims to refine the SEEM (Segment Everything Everywhere All At Once Model) foundational model through advanced fine-tuning techniques and thorough dataset preparation. The dataset preparation process includes creating annotated ground truth images for panoptic learning and generating grounding an- notation files. These annotations encompass segmentation masks and bounding boxes, with each category uniquely colored. Grounding annotations link segmenta- tion details with descriptive sentences, establishing connections between segmenta- tions and their context. Recently, transformers have demonstrated significant suc- cess in various computer vision domains due to their dynamic modeling capabilities and long-range dependencies. Vision transformers have outperformed CNN models in tasks like object detection and semantic segmentation.
To incorporate domain- specific knowledge into SEEM, we focused on fine-tuning the vision backbone, the segmentation head, or both, using adapters to preserve the model’s original parame- ters
Tipo di pubblicazione
URI
![]() |
Modifica (riservato agli operatori) |
