polito.it
Politecnico di Torino (logo)

A Study on Deep Learning Approaches for Visual Geo-localization

Riccardo Mereu

A Study on Deep Learning Approaches for Visual Geo-localization.

Rel. Barbara Caputo, Carlo Masone. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (19MB) | Preview
Abstract:

Many applications in Artificial Intelligence require the capability of an autonomous system to accurately and efficiently locate itself in the real world. The task of Visual Geo-localization (VG) can be formulated as the ability to recognize the geographical location of a picture, using only its visual information and comparing it to a database of geotagged images, which represent the previously visited places or the area under analysis. In the last two decades, this field has seen rapid growth in interest and technical development from different communities. Consequently, the research landscape has become increasingly fragmented and dissociated. The first half of this thesis work consists of an extensive survey of Deep Learning methods and the development of a benchmarking framework. This effort aims to create a clear and fair evaluation protocol for VG methods, provide a complete and flexible training platform, and establish effective good practices for real-world applications. An exhaustive collection of experiments accompanies all the techniques under analysis. Their performances are evaluated on six well-established VG datasets to assess their generalization and robustness capabilities. The second half of this work focuses on an extension to the classical VG task, in which the inputs for the system are short sequences of frames instead of single images. The aim is to explore the extension of current VG architectures with temporal and multi-view information. This approach is of particular interest for mobile robotics, autonomous vehicles and augmented virtual reality applications that inherently deal with visual data flows. In particular, this work investigates the use of architectures based on self-attention mechanisms and Vision Transformers. The focus is on the sequence-based VG problem formulated as matching a query sequence to a shortlist of database sequences depicting the same location.

Relators: Barbara Caputo, Carlo Masone
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 126
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/21189
Modify record (reserved for operators) Modify record (reserved for operators)