Audio-Visual Human Activity Recognition for Humanoid Robotics

Lucia Innocenti

Audio-Visual Human Activity Recognition for Humanoid Robotics.

Rel. Barbara Caputo. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (38MB) | Preview

Abstract:	Recent advances in Robotics research are pushing the limits of machines toward faster, smarter, and more efficient devices. Notable results have been achieved from and hardware point of view, with humanoid devices that are able to consistently outperform human movements. From a perception perspective, however, we are still far from being capable to match human skills. The gap is even more consistent when we introduce constraints typical of on-board implementations, such as limited resources and real-time requirements, which leads to models that potentially work perfectly on a theoretical scenario, but are not deployable in real applications. Among the others, one of the most important tasks that a humanoid robot should implement correctly to enable a fruitful human-robot interaction is accurate human activity recognition, namely the identification, online, of the action that the human is performing, with the goal of triggering a consequent action,e.g. to support the task. State-of-the-art approaches typically provide solutions that rely on large and complex neural models, which can hardly be deployed in the constrained hardware of a fully autonomous humanoid robot. This thesis provides a solution to the problem of human activity recognition with an eye on dimensionality, by providing a solution based on skeleton data and graph neural networks. The working scenario is a kitchen environment, with a single human and the robot being in the same room. A camera, hosted in the robot head, collects videos of the human that are sent to the main model, which, after preprocessing, performs a classification task. The architecture I propose demonstrated to be a convenient way for node level, edge level, and graph level prediction tasks. In this work, I focused on graph level prediction, by comparing different architectures for the updating and messages passing functions. The architecture achieves interesting results on the dataset used, and it is demonstrated to be efficient when deployed directly in the robot hardware. It is also worth reporting that the model I developed contains an open-world approach such that, if the confidence for the predicted action is not sufficient, the robot is able to communicate that the classification task was ineffective; in this case, the prediction is performed based on data provided directly from the human by speech, that are analyzed using Natural Language Processing techniques. Future developments of this work will implement an Incremental Learning policy to expand the knowledge of the model to novel classes and will consider the enrollment of sequence-based models such as long short-term memory networks.
Relatori:	Barbara Caputo
Anno accademico:	2021/22
Tipo di pubblicazione:	Elettronica
Numero di pagine:	81
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela:	Karlsruhe Institute of Technology (KIT) Institute for Anthropomatics and Robotics (IAR) High Performance Humanoid Techno (GERMANIA)
Aziende collaboratrici:	Karlsruher Institut für Technologie / Karlsruhe Institute of Technology - KIT
URI:	http://webthesis.biblio.polito.it/id/eprint/22650

Modifica (riservato agli operatori)