Politecnico di Torino (logo)

Estimating Depth Images from Monocular Camera with Deep Learning for Service Robotics Applications

Luisa Sangregorio

Estimating Depth Images from Monocular Camera with Deep Learning for Service Robotics Applications.

Rel. Marcello Chiaberge, Mauro Martini. Politecnico di Torino, Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (30MB) | Preview

Estimating depth information from images is a fundamental and critical job in computer vision, as it may be utilized in a large range of applications such as simultaneous localization and mapping, navigation, object identification, and semantic segmentation. Depth extraction can be faced with different techniques: geometry-based (stereo-matching, structure from motion), sensor-based (LiDAR, structured-light, TOF), and deep learning-based. In particular, monocular depth estimation is the challenge of predicting a depth map using just a single RGB image as input. This significantly reduces the cost and the power consumption for robotics and embedded devices. However, it is frequently described as an ill-posed problem, since an infinite number of 3D scenes might actually correspond to a single 2D view of a scene. Recently, thanks to the fast development of deep neural networks, monocular depth estimation via Deep Learning (DL), using Convolutional Neural Networks (CNN), has garnered considerable attention, and demonstrated promising and accurate results. With the aim of enabling depth-based real-time applications, this work focuses on lightweight networks, comparing two different CNN models. At the same time, a great effort has been also devoted to the choice of models able to reach state-of-the-art performance on widely used datasets, such as NYU depth and KITTI. Since the behaviour of a neural network highly relies on the training data and the intended context for this network is indoors, two unique datasets have been gathered capturing photos from a mobile robot (TurtleBot3) to better fit the target environment. The first one, made with RealSense D435i is composed of pairs of aligned RGB and depth images. The second dataset, made with ZED 2, is composed of left and right stereo pairs which come along depth and disparity ground truth. Fast Depth, is devoted to obtaining a dense depth map in a supervised manner while keeping the complexity and computational effort low. The supervised approach has some weaknesses since requires quality depth data in a range of environments. Monodepth faces depth extraction as an image reconstruction problem, it learns how to predict a disparity map, starting from a single image to reduce the photometric error. In this case, the learning process is fully unsupervised, and it needs only a stereo pair as input. Moreover, to improve the predicted depth map for collision-avoidance tasks, the loss function has been regulated by a weighting map that considers the information of the nearest obstacles. To enhance the effectiveness of these frameworks in my specific case study, the networks have been trained with the collected custom datasets, both from scratch and fine-tuning the pre-trained weights. Finally, the results have been analysed qualitatively and quantitatively, using the main evaluation metrics such as RMSE, RMSE log, Abs Rel, Sq Rel, and Accuracy between the estimated depth and the ground truth. The effectiveness of the predicted depth map has been tested by developing a simple navigation algorithm for obstacle avoidance with a TurtleBot3. The natural progression of this work may be the integration of the depth estimator with advanced autonomous navigation systems in support of indoor service robotics tasks.

Relators: Marcello Chiaberge, Mauro Martini
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 86
Corso di laurea: Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica)
Classe di laurea: New organization > Master science > LM-25 - AUTOMATION ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/22847
Modify record (reserved for operators) Modify record (reserved for operators)