polito.it
Politecnico di Torino (logo)

Meta-Learning for Cross-Domain One-Shot Object Detection

Salvatore Polizzotto

Meta-Learning for Cross-Domain One-Shot Object Detection.

Rel. Tatiana Tommasi, Francesco Cappio Borlino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (15MB) | Preview
Abstract:

Computer Vision research today is focusing more and more on deep learning, because it is the machine learning paradigm that produces the best results. The main feature is that we can train an end-to-end network both to extract features from the data and to create models able to exploit them to solve certain tasks. In this way there is no need to manually extrapolate the right characteristics from the data to pass to the final model, thus overcoming the problems of shallow approaches and guaranteeing results with a much lower error rate. The main limitation is that deep networks have many parameters and therefore, in order to be trained, they need a large amount of labelled data, which is not always available. Another big problem is that many models are trained from scratch for certain tasks, using a fixed learning algorithm, and this means that they are unusable for other applications. An alternative paradigm that is attracting a lot of interest in research is Meta-Learning, also known as learning to learn. Its goal is to make the network capable of modeling by itself the learning algorithm. In practice, the network is meta-trained on a large number of different subtasks, producing a model ready to be finetuned on a small amount of data and able to generalize to any new task. The focus of this master thesis work is to use Meta Learning for cross-domain analysis. The main challenge in this field is that models trained on a certain domain are mostly unusable on domains never seen before, a problem known as domain shift. More specifically, our objective is to create a visual object detector able to adapt on each test sample before performing predictions. To take advantage of unlabeled target samples it is possible to exploit the power of self-supervised learning by enriching the standard object-detector with the auxiliary objective of recognizing rotations applied to the objects. Since it does not need any manual annotation, this simple second task runs seamlessly on each single test image, helping the network to adapt to the style of the new instance. The core step of this work is to code the logic of meta-learning within the self-supervised rotation task. We build over an existing method that deals with one-shot unsupervised domain adaptation and show that meta-learning allows to adapt more quickly to the various samples, producing good results in the inference phase. In this way we get the best of both worlds and establish the new state-of-the art in detection for social media monitoring and autonomous driving scenarios.

Relatori: Tatiana Tommasi, Francesco Cappio Borlino
Anno accademico: 2020/21
Tipo di pubblicazione: Elettronica
Numero di pagine: 74
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/18186
Modifica (riservato agli operatori) Modifica (riservato agli operatori)