Politecnico di Torino (logo)

Boundary Conditions for Human Gaze Estimation on A Social Robot: Evaluation of the State-of-the-Art Models and Implementation of Joint Attention Mechanism

Nicola Scarano

Boundary Conditions for Human Gaze Estimation on A Social Robot: Evaluation of the State-of-the-Art Models and Implementation of Joint Attention Mechanism.

Rel. Andrea Bottino. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (17MB) | Preview

Humans are highly effective collaborators, able to quickly coordinate with each other, often without the need for detailed guidance or instructions. This is because they can rely on a variety of communication cues to guide interactions, both explicit, such as gestures or words, and implicit, namely gaze. Gaze is a powerful nonverbal communication cue in humans and can contain different types of information: interest, engagement, and attention in social interactions are just a few. Social robotics is a discipline that focuses on building robots with strong communication capabilities that are able to interact with each other, with humans, and with the surrounding environment. These particular types of robots can derive tremendous benefits from the ability to recognize the line of sight of their human partners. By knowing where humans are looking, social robots can share the attention of others and behave more humanely, increasing the trust of their partners. Recent advances in Deep-Learning-based gaze estimation improved the relevance of such cues in social robotics. A mix of good performance, scalability, and cost make these Deep Learning methods an effective alternative to older technologies (e.g., eye-tracking glasses). On the other hand, while there are several papers in the literature on datasets and algorithms, there are few studies on the application of appearance-based gaze estimation in Human-Robot Interaction (HRI) scenarios. There is a need for standardized, well-defined experiments to thoroughly evaluate the performance of these models in a social setting. In this thesis, we present an experiment to investigate in a social interaction scenario the performance of the most relevant gaze estimation methods currently available in the literature. During the experiment, images of people looking at different targets are captured by two cameras located in front of them. The experiment is conducted in a laboratory environment designed to resemble the human-robot interaction scene as closely as possible. The images with the ground truth annotations are then used to generate a small dataset for the evaluation of the algorithms. In this work, we test two deep learning models (L2CS and ETH) trained on the main datasets available today: Gaze360 [2019] and ETH-XGaze [2021]. The models are tested on our dataset and the results are analyzed through statistical and graphical tools allowing us to extract important insight on the performance of models and datasets in a social interaction scenario. In the last part of this work, we performed a fine-tuning of L2CS on our dataset, increasing its performance on our specific task. The model is then used to implement a joint attention behavior in the Pepper Softbank robot allowing it to perform real-time responses to human gaze.

Relators: Andrea Bottino
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 66
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Ente in cotutela: Vrije Universiteit Amsterdam (PAESI BASSI)
Aziende collaboratrici: Vrije Universiteit Amsterdam
URI: http://webthesis.biblio.polito.it/id/eprint/28005
Modify record (reserved for operators) Modify record (reserved for operators)