polito.it
Politecnico di Torino (logo)

Vision Transformers for Surgical Scene Understanding and Skill Assessment in Minimally Invasive Robotic Suturing

Andrea Borgno

Vision Transformers for Surgical Scene Understanding and Skill Assessment in Minimally Invasive Robotic Suturing.

Rel. Kristen Mariko Meiburger, Francesco Marzola, Alberto Arezzo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2025

[img] PDF (Tesi_di_laurea) - Tesi
Accesso riservato a: Solo utenti staff fino al 23 Luglio 2026 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (37MB)
Abstract:

Minimally invasive robotic surgery has marked a revolution in clinical practice, and a significant aspect of its future evolution involves analyzing endoscopic visual data to develop intelligent assistance functionalities. One of the main challenges is understanding the surgical workflow, which involves recognizing activities at multiple levels of granularity, from macro-level procedural phases to micro-level atomic actions. This work addresses this challenge by focusing on a paradigmatic and complex task for robotic surgery: suturing. The first contribution is the creation and annotation of the LUMICS (Multi-Level Understanding of Minimally Invasive Colon Suturing) dataset, consisting of 15 videos of suturing procedures on porcine colon, performed with the da Vinci Research Kit (dVRK) system at the MITIC laboratory, Department of Surgical Sciences, University of Turin. The primary objective is the development and validation of a Computer Vision pipeline for the multi-level analysis of suturing videos from LUMICS, including the recognition of 3 surgical steps, 12 atomic actions, and the segmentation of 2 surgical instruments. An additional goal is the implementation of an automatic and sensor-free system for the objective assessment of surgical skills, capable of distinguishing between expert and novice operators. The methodological core of the project is the adaptation of the TAPIS (Transformers for Actions, Phases, Steps and Instruments Segmentations) model, a state-of the-art Transformer-based architecture selected for its effectiveness in modeling the hierarchical structure of surgical activities. The model integrates two main components: a segmentation baseline for instrument localization and a video feature extractor for spatio-temporal context modeling. After fine-tuning and optimization, the model demonstrates strong performance on the LUMICS dataset. It achieves a mean average precision (mAP) of 98.01 % in surgical step recognition and 70.41 % in atomic action recognition. It also delivers robust instrument segmentation, with a mAP of 99.20 % and an average intersection over union (IoU) exceeding 96 %. Based on the model’s predictions, a pipeline has been developed for the automatic classification of surgical competence. For each video, three feature categories are extracted: the percentage distribution of actions, the duration of steps, and the smoothness of instrument movements (estimated via the ALDLJ metric). A Random Forest classifier trained on these features achieves an accuracy of 87 % in distinguishing between Master and Beginner surgeons. A SHAP-based interpretability analysis confirms the clinical relevance of the approach, identifying procedural slowness, movement inefficiency, and low smoothness as the strongest predictors of lower surgical skill. In conclusion, this work demonstrates the feasibility of an integrated approach for the automated understanding of the surgical scene and the objective assessment of skills in robotic suturing. These results represent a concrete step towards the integration of artificial intelligence in robotic surgery, establishing the foundation for the development of context-aware systems capable of supporting procedural analysis, facilitating surgical training, and automating post-operative documentation.

Relatori: Kristen Mariko Meiburger, Francesco Marzola, Alberto Arezzo
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 115
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici: UNIVERSITA' DEGLI STUDI DI TORINO
URI: http://webthesis.biblio.polito.it/id/eprint/36187
Modifica (riservato agli operatori) Modifica (riservato agli operatori)