Segmenting Dynamic Objects in 3D from Egocentric Videos

Francesco Borgna

Segmenting Dynamic Objects in 3D from Egocentric Videos.

Rel. Tatiana Tommasi, Chiara Plizzari. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (121MB) | Preview

Abstract:	With the increasing availability of egocentric wearable devices, there has been a surge in first-person videos, leading to numerous studies aiming to leverage this data. Among these efforts, 3D scene reconstruction stands out as a key area of interest. This process allows for the recreation of the scene where the video was captured, providing invaluable support for the growing field of augmented reality applications. Some egocentric datasets include static 3D scans of recording locations, usually requiring costly hardware or dedicated scans. An alternative approach involves reconstructing the scene directly from video frames using Structure from Motion (SfM) techniques. This method not only captures the motion of the actor and the objects they interact with, including transformations (e.g., slicing a carrot) but also enables the use of any egocentric footage for scene reconstruction, even without physical access to the environment in real life. However, the task of decomposing dynamic scenes into objects has received limited attention. For example, SfM finds it challenging to distinguish between moving and static parts, resulting in cluttered point cloud reconstructions where the same object may appear superimposed or in multiple places within the scene. In this thesis, we combine SfM with egocentric methods to segment moving objects in 3D. This is achieved by creating a scene with COLMAP, a SfM algorithm, and then modifying a recent algorithm called NeuralDiff, originally designed for producing 2D segmentations of static objects, foreground, and actors, to extract 3D geometry. Additionally, we explored ways to reduce the overall computational demands, such as by simplifying the NeuralDiff architecture to better meet our goals by merging the foreground and actor streams, and by developing an intelligent video frame sampling technique that captures the essence of the scene using fewer frames.
Relatori:	Tatiana Tommasi, Chiara Plizzari
Anno accademico:	2023/24
Tipo di pubblicazione:	Elettronica
Numero di pagine:	121
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/31465

Modifica (riservato agli operatori)