Towards Egocentric Scene Graph Understanding with Graph Neural Networks

Maria Rosa Scoleri

Towards Egocentric Scene Graph Understanding with Graph Neural Networks.

Rel. Tatiana Tommasi, Antonio Alliegro. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (27MB) | Preview

Abstract

Egocentric vision is a domain of computer vision centered on video data captured from wearable devices such as head-mounted cameras. Videos from the user's viewpoint offer unique insights into human behavior and environmental contexts, with applications in augmented reality, activity recognition, and human-computer interaction. This thesis aims to develop a model to extract relevant features from egocentric videos exploiting labels constructed using scene graphs, which summarize the content of a given frame with verb-object-relationship triplets. Moreover, we propose a novel approach to the action anticipation task using graph-structured encoded data. We employ a Graph Neural Network (GNN) where visual features extracted from video frames serve as GNN nodes, while edges model the relationships between them.

The training of the GNN employs verb-object-relationship triplets as labels, allowing the model to learn relevant frame features for egocentric tasks