Masoud Karimi
Few Shot Adaptation of VLM for Panoptic Segmentation.
Rel. Andrea Bottino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (9MB) | Preview |
Abstract
This project aims to refine the SEEM (Segment Everything Everywhere All At Once Model) foundational model through advanced fine-tuning techniques and thorough dataset preparation. The dataset preparation process includes creating annotated ground truth images for panoptic learning and generating grounding an- notation files. These annotations encompass segmentation masks and bounding boxes, with each category uniquely colored. Grounding annotations link segmenta- tion details with descriptive sentences, establishing connections between segmenta- tions and their context. Recently, transformers have demonstrated significant suc- cess in various computer vision domains due to their dynamic modeling capabilities and long-range dependencies. Vision transformers have outperformed CNN models in tasks like object detection and semantic segmentation.
To incorporate domain- specific knowledge into SEEM, we focused on fine-tuning the vision backbone, the segmentation head, or both, using adapters to preserve the model’s original parame- ters
Tipo di pubblicazione
URI
![]() |
Modifica (riservato agli operatori) |
