polito.it
Politecnico di Torino (logo)

Few Shot Adaptation of VLM for Panoptic Segmentation

Masoud Karimi

Few Shot Adaptation of VLM for Panoptic Segmentation.

Rel. Andrea Bottino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (9MB) | Preview
Abstract:

This project aims to refine the SEEM (Segment Everything Everywhere All At Once Model) foundational model through advanced fine-tuning techniques and thorough dataset preparation. The dataset preparation process includes creating annotated ground truth images for panoptic learning and generating grounding an- notation files. These annotations encompass segmentation masks and bounding boxes, with each category uniquely colored. Grounding annotations link segmenta- tion details with descriptive sentences, establishing connections between segmenta- tions and their context. Recently, transformers have demonstrated significant suc- cess in various computer vision domains due to their dynamic modeling capabilities and long-range dependencies. Vision transformers have outperformed CNN models in tasks like object detection and semantic segmentation. To incorporate domain- specific knowledge into SEEM, we focused on fine-tuning the vision backbone, the segmentation head, or both, using adapters to preserve the model’s original parame- ters. This approach maintains the model’s initial capabilities while integrating new knowledge. Adapters have shown to be effective for fine-tuning large-scale models, enhancing their performance in both new and existing tasks by enabling transfer learning.

Relators: Andrea Bottino
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 78
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/31780
Modify record (reserved for operators) Modify record (reserved for operators)