Pietro Vignini
Efficient People 4D Pose estimation and tracking for social controllers.
Rel. Marcello Chiaberge. Politecnico di Torino, Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica), 2024
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (12MB) | Preview |
Abstract: |
As social robots increasingly engage with humans in dynamic environments, one of the main challenges is to develop a perception system that can detect and track people’s position, velocity and orientation in real time. By integrating RGBD data with multi-object tracking frameworks, this work seeks to provide a reliable solution for 4D pose estimation. The tracking-by-detection paradigm, which separates the detection phase from the tracking phase, has established itself as one of the most used approaches for online, real-time multi-object tracking applications. Following this paradigm, three methods were studied, developed and tested, in a progressive approach to improve the tracking accuracy and the quality of the estimated 4D poses. In the first method, YOLOv8 Segmentation was combined with a modified version of SORT tracking algorithm; the segmentation masks provided by YOLOv8 were used to extract the centroids of the people in the scene. These 2D centroids were then deprojected to 3D points using the depth map from the RGBD camera to obtain 3D positions to be fed to the tracking algorithm. To improve identity association and reduce ID switches during the tracking phase, in the second method SORT was replaced with StrongSORT, a more advanced tracking algorithm that integrates a Re-ID model to exploit visual features to associate detections to tracks. In both methods, a person’s orientation was obtained as the angle described by the estimated velocity vector. To further refine orientation accuracy, in the third method YOLOv8 Pose was used to extract people body keypoints to directly estimate the orientation. Keypoints were also used to obtain the 3D positions of the detected people. To evaluate and compare the different methods, a dataset consisting of multiple RGBD videos was recorded, capturing different levels of complexity in terms of number of people and occlusions. Ground truth data was obtained with a motion capture system to quantitatively determine the accuracy of estimated positions, velocities and orientations. The computational efficiency of each method was also measured to verify real-time capabilities. |
---|---|
Relatori: | Marcello Chiaberge |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 82 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-25 - INGEGNERIA DELL'AUTOMAZIONE |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/33825 |
Modifica (riservato agli operatori) |