Claudio Macaluso
Adapting Vision-Language Models for Open-Vocabulary Object Detection through Prompt Learning.
Rel. Barbara Caputo, Fabio Cermelli, Gabriele Rosi. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (28MB) |
| Abstract: |
In recent years, foundational vision-language models have opened new opportunities for addressing open-vocabulary object detection, with applications such as automatic image annotation. However, despite their generalization ability, these models often lack the specialization required to adapt efficiently to novel datasets or domains, especially in low-data regimes. This thesis investigates the use of prompt learning techniques, originally developed in the natural language processing field, to enhance the adaptability of vision-language models for object detection. By leveraging the intrinsic fusion of text and visual modalities in these architectures, we extend current baselines with prompt-based methods and evaluate their performance in few-shot setups. The results demonstrate that the proposed approaches consistently outperform the baseline method, showing the potential of prompt learning to specialize foundational models for the task of image annotation with minimal supervision. |
|---|---|
| Relatori: | Barbara Caputo, Fabio Cermelli, Gabriele Rosi |
| Anno accademico: | 2025/26 |
| Tipo di pubblicazione: | Elettronica |
| Numero di pagine: | 63 |
| Soggetti: | |
| Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
| Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
| Aziende collaboratrici: | FOCOOS AI S.R.L. |
| URI: | http://webthesis.biblio.polito.it/id/eprint/37729 |
![]() |
Modifica (riservato agli operatori) |



Licenza Creative Commons - Attribuzione 3.0 Italia