Synthetic-to-Real Domain Transfer with Joint Image Translation and Discriminative Learning for Pedestrian Re-Identification

Antonio Dimitris Defonte

Synthetic-to-Real Domain Transfer with Joint Image Translation and Discriminative Learning for Pedestrian Re-Identification.

Rel. Barbara Caputo, Mirko Zaffaroni. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (23MB) | Preview

Abstract:	Person re-identification is a challenging computer vision task where one wants to match each probe pedestrian to the corresponding images in the gallery set. Pose, viewpoint and illumination variations have been well-known issues. Despite this, recent developments have shown positive results when models are trained and tested on the same dataset. However, different datasets present unrelatable characteristics, to the point that they define distinct domains. So far, achieving a good performance on cross-domain approaches has been proven to be much more demanding than training standard supervised methods. Recent models that bridge the gap across domains have drawn significant attention since, from a practical perspective, annotating new data is error-prone and time-consuming, whereas having unlabeled images is much less expensive. Moreover, the emerging field of synthetic pedestrian re-identification is gaining momentum. Instead of employing real world-data, the environments are computer-generated. On top of easing the annotation process, this gives more freedom relative to what is available in a real-world scene. From another perspective, synthetic data also addresses ethical issues such as recording people without authorization and exploiting those videos for sensitive applications. The objective of this work was to generalize from our synthetic dataset GTASynthReid, exclusively built by exploiting the graphic engine of Grand Theft Auto V, to real-world data. Starting from these motivations, we embraced a generative approach that performs synth-to-real image translation and jointly learns pedestrian feature descriptors. We injected target domain information into a network trained on the source identities. To the best of our knowledge, we are the first to adopt the Contrastive Unpaired Translation framework in our task. Instead of learning via "cycle consistency", it encourages corresponding patches of the input and output images to be similar, allowing "one-way" translation. We also designed a feature matching loss for the discriminator to increase performance. We show that, although current methods obtain scores that are difficult to reach, our pipeline can achieve results that are comparable to and even better than earlier similar approaches, both with real and synthetic data. We also show that the similarity between our dataset and each target increases after the image translation.
Relatori:	Barbara Caputo, Mirko Zaffaroni
Anno accademico:	2021/22
Tipo di pubblicazione:	Elettronica
Numero di pagine:	119
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	FONDAZIONE LINKS
URI:	http://webthesis.biblio.polito.it/id/eprint/23431

Modifica (riservato agli operatori)