Claudio Macaluso
Adapting Vision-Language Models for Open-Vocabulary Object Detection through Prompt Learning.
Rel. Barbara Caputo, Fabio Cermelli, Gabriele Rosi. Politecnico di Torino, Master of science program in Computer Engineering, 2025
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial No Derivatives. Download (28MB) | Preview |
Abstract
In recent years, foundational vision-language models have opened new opportunities for addressing open-vocabulary object detection, with applications such as automatic image annotation. However, despite their generalization ability, these models often lack the specialization required to adapt efficiently to novel datasets or domains, especially in low-data regimes. This thesis investigates the use of prompt learning techniques, originally developed in the natural language processing field, to enhance the adaptability of vision-language models for object detection. By leveraging the intrinsic fusion of text and visual modalities in these architectures, we extend current baselines with prompt-based methods and evaluate their performance in few-shot setups. The results demonstrate that the proposed approaches consistently outperform the baseline method, showing the potential of prompt learning to specialize foundational models for the task of image annotation with minimal supervision..
Relators
Academic year
Publication type
Number of Pages
Course of studies
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modify record (reserved for operators) |
