polito.it
Politecnico di Torino (logo)

Analysis of 3D perception based on depth sensors in order to perform 3D scene understanding

Edoardo Fioriti

Analysis of 3D perception based on depth sensors in order to perform 3D scene understanding.

Rel. Nicola Amati, Andrea Tonoli, Stefano Feraco, Massimiliano Curti. Politecnico di Torino, Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica), 2021

Abstract:

In collaboration with the Teoresi group company and Politecnico di Torino, the issues of 3D scene understanding were explored to perform an analysis of 3D perception features with the help of depth sensors. We started to study the already featured libraries regarding the use of depth sensors for real-world 3D learning. Otherwise, if these are not useful for our purpose, alternative solutions should be studied. We started studying the main libraries that already work with three-dimensional environments: Tensorflow 3D, Open3D, Mediapipe and Yolo. Initially, the various libraries were all tested with the Intel Realsense D435 depth camera directly connected to the machine. Once we understood how it works, a Raspberry Pi 3 board has been implemented to test the camera and the algorithm in real-time. Tensorflow 3D has been a real failure, due to the immaturity of the library born in the current year. The new technology developed by Google is not yet ready to be easily studied and implemented. The next library provided us with new interesting notions with the 3D reconstruction of the environment. Therefore, Open3D seems a library ready to be used and improved. In addition, the library can recognize objects, however, can be used only with a Lidar sensor. The reconstruction feature has been tested and even if, with the presence of small bugs, the result is satisfactory. Switching to Mediapipe, particularly in the sub-session Objectron, 3D object detection was studied. As a result, the library seemed to be valid, but actually with some limits. It is not possible to train the model yet and thus we could not enlarge the dataset. Finally, Yolo seems to be one of the best libraries for detection, even if it works only two-dimensionally. Once studied, we realized that it was possible to exploit the detection of Yolo to create a 3D detection. Moreover, we developed scripts able to take as input the 2D detection of Yolo and as output a 3D frame of the object recognized. Besides, thanks to the depth sensor D435, a system able to detect the distance of the object was implemented. The algorithm takes the distance on a multipoint section of the detection and the smaller one is printed on the screen. The created algorithm can be implemented to perform both image and video recognition as well as the real-time situation. In the real-time case, the process needs to be speeded up, since it takes more than a second to complete its calculations. As a final step, a 3D scanner has been created with the help of an Arduino Uno. The scanner may collect the point cloud from the camera and be processed by the software. As output, it gives us a stl file that can be processed and used for various purposes. The system is composed of an Arduino Uno, a stepper, a D435 camera, a structure printed entirely with a 3D printer and the respective cables for connections. All the topics treated at the experimental level have brought results that highlight the model performances of the convolutional neural networks. Tensorflow and Yolo were the best libraries since the training and testing phases. The values of the loss function are comparable to each other, while they are better than the Open3D ones. Nevertheless, the accuracy of the models is similar, and so it is not easy to declare a winner. Tensorflow seems to have a minimal advantage: it results to have the best 2D detection. Obviously, all tests have been carried out with the same machine, same probability and characteristics, using the same input parameters.

Relatori: Nicola Amati, Andrea Tonoli, Stefano Feraco, Massimiliano Curti
Anno accademico: 2021/22
Tipo di pubblicazione: Elettronica
Numero di pagine: 99
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-25 - INGEGNERIA DELL'AUTOMAZIONE
Aziende collaboratrici: Teoresi SPA
URI: http://webthesis.biblio.polito.it/id/eprint/21023
Modifica (riservato agli operatori) Modifica (riservato agli operatori)