polito.it
Politecnico di Torino (logo)

Large-Language Models for Proactive Physical Assistance in Human-Robot Collaborative Assembly

Tiziano Balzani

Large-Language Models for Proactive Physical Assistance in Human-Robot Collaborative Assembly.

Rel. Alessandro Simeone, Yuchen Fan. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Gestionale, 2025

Abstract:

This thesis designs and implements a human-robot collaborative assembly assistance system based on first-person vision and generative AI. The system's core objective is to identify spatial-alignment difficulties encountered by an operator during precision assembly tasks and to provide, if necessary, physical guidance through a collaborative robot. By demonstrating the correct trajectory and force, the system serves both assistive and instructional purposes. During operation, an integrated camera on the gripper continuously for 10 seconds captures the actions and movements done by the operator. A lightweight Visual Language Model (VLM), governed by a meticulously designed prompt, analyzes this feed in real-time to diagnose operator hesitation or error during critical steps, such as shaft-hole alignment. Upon detecting a difficulty, the system initiates a voice interaction to confirm the need for assistance. Following operator consent, the UR10 collaborative robot is activated. The robot's end-effector will hold a metal bracelet which contains the front arm of the operator. Based on the target pose calculated from a pre-arranged setup environment, the robot employs an admittance control algorithm to generate compliant and safe guidance forces. This guidance does not replace the operator's action but instead collaboratively drives their hand to complete the precise alignment and insertion, facilitating intuitive tactile learning of the correct operational procedure. The assembly of a long bolt into a stator housing hole is selected as the validation task. System performance is comprehensively evaluated based on multiple metrics, for example: operational difficulty recognition accuracy, human-robot guided task success rate, and overall system response time. The deliverables of this work include a well-structured and documented ROS package, an effective VLM prompt engineering strategy for diagnosing alignment difficulties, a functional integrated prototype, and a comprehensive experimental analysis of system robustness, failure modes, and scalability. The primary contribution of this research is a reproducible human-robot collaboration framework that innovatively integrates the semantic perception capabilities of VLMs with the compliant physical interaction enabled by admittance control. This closes the loop from high-level intention understanding to low-level physical interaction, providing a viable methodological and practical foundation for building next-generation intelligent assistants that can proactively perceive human operational struggles and offer immediate, intuitive physical demonstration.

Relatori: Alessandro Simeone, Yuchen Fan
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 90
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Gestionale
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-31 - INGEGNERIA GESTIONALE
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/38182
Modifica (riservato agli operatori) Modifica (riservato agli operatori)