3D indoor environment reconstruction for AR/VR applications using a smartphone device

Alessandro Sergio Manni

3D indoor environment reconstruction for AR/VR applications using a smartphone device.

Rel. Andrea Sanna, Federico Manuri, Damiano Oriti. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (38MB) | Preview

Abstract:	Recent developments in AR / VR have created new ways of remote communication and collaboration. From education to entertainment, reconstructing the digital version of real environments could enhance extended reality (XR) experiences. When a dedicated depth sensor is used to scan the environment, the result is a point cloud that usually has low resolution, noise, and a high number of points that do not allow a smooth experience in AR /VR applications. To overcome the limitation of a full scan of the environment, several approaches have been introduced following the success of Deep Learning: Single view scene reconstruction relies on a single RGB photo of a scene to reconstruct the environment. The results depend on the complexity of the scene as it is usually difficult to capture all objects in a room from a single photo. Moreover, it does not solve the global scale depth ambiguity that arises when depth is inferred from a 2D image. The system presented in this thesis attempts to overcome most of the limitations of the state of the art by taking as input a single RGB image for each object that the user wishes to reconstruct using an Android smartphone equipped with a single RGB camera. The image of each object is augmented with depth information provided by Google ARCore. Then a server processes this data to classify the object in the image and retrieve the most similar synthetic 3D CAD object from a database. The output is the reconstructed scene in which each 3D CAD model has a 7-DoF pose estimated by the system: position, scale and vertical rotation. The scene can also be visualized in AR with the smartphone to verify that each object matches its physical counterpart. Reconstructing a scene with 5 models takes 21 seconds. To evaluate the accuracy of the system, a new dataset is introduced. The dataset consists of 500 snapshots of different objects divided into 13 categories. By measuring the translation, rotation, and scaling errors of the considered objects with respect to the physical objects that serve as ground truth, the proposed solution achieves a maximum error of 18 percent for scale factor, less than 9 centimeters for position, and less than 18 degrees for rotation. These results indicate that the proposed system can be used for XR applications, thus bridging the gap between the real and virtual worlds.
Relatori:	Andrea Sanna, Federico Manuri, Damiano Oriti
Anno accademico:	2021/22
Tipo di pubblicazione:	Elettronica
Numero di pagine:	84
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/21097

Modifica (riservato agli operatori)