polito.it
Politecnico di Torino (logo)

EVEgo: Egocentric Event-data for cross-domain analysis in first-person action recognition

Emanuele Gusso

EVEgo: Egocentric Event-data for cross-domain analysis in first-person action recognition.

Rel. Barbara Caputo, Mirco Planamente, Chiara Plizzari, Marcello Restelli. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (22MB) | Preview
Abstract:

Dynamic Vision Sensors are innovative bio-inspired devices able to asynchronously detect pixel-wise brightness changes called “events”. The result is a stream of data encoding time, pixel location and sign of the captured intensity changes. Their novel data acquisition method provides significant advantages over conventional sensors, particularly in low-light and high-speed motion conditions. Indeed, their high pixel bandwidth reduces motion blur, a phenomenon which arises from their rapid and involuntary movements caused by the user, and their wide dynamic range makes them an attractive alternative to traditional cameras in challenging robotics and computer vision scenarios. Moreover, the low latency and low power consumption of these novel sensors enable their use in several new real-world applications, especially related to the field of wearable devices. The aforementioned peculiarities make them ideal for addressing well-known issues associated with the usage of wearable devices, such as continuous visual stimuli and background clutter. Nevertheless, the potential of event cameras in certain areas, such as first-person action recognition, remains unexplored. The EPIC-Kitchen large-scale dataset, which includes many input modalities such as audio, RGB and optical flow, pushed us to introduce and explore the behavior of the event modality from a first-person perspective. In this thesis, we enhance it by adding its synthetic event-modality version, accompanied with a large benchmark of the most well-known architectures in the context of first-person action recognition. This benchmark, reached through extensive experiments, shows how the event modality perform compared to RGB and optical flow ones, when used in single- or in multi-modal fashion, unlocking the potential of event data in both intra- and cross-domain scenarios.

Relators: Barbara Caputo, Mirco Planamente, Chiara Plizzari, Marcello Restelli
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 126
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/20592
Modify record (reserved for operators) Modify record (reserved for operators)