polito.it
Politecnico di Torino (logo)

Vision-based Approaches for Surgical Tool Pose Estimation in Minimally Invasive Robotic Surgery

Sabrina Gennaro

Vision-based Approaches for Surgical Tool Pose Estimation in Minimally Invasive Robotic Surgery.

Rel. Kristen Mariko Meiburger. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2025

[img] PDF (Tesi_di_laurea) - Tesi
Licenza: Non pubblico - Accesso privato / Ristretto.

Download (17MB)
Abstract:

In recent years, Robotic-Assisted Minimally Invasive Surgery (RMIS) has led to significant improvements in surgical precision and patient safety. Accurate 6D pose estimation of surgical tools is a fundamental enabler for several critical capabilities that enhance both human-in-the-loop and semi-autonomous interventions. The da Vinci Research Kit (dVRK) offers an open-source platform to study these tasks in both simulated and real environments. As part of an ongoing research, this thesis focuses on pose estimation as a key component in the automation of tasks such as suturing using the dVRK. This work investigates and compares two strategies for 6D pose estimation of dVRK instruments: a Marker-based approach and a Model-based, Marker-Less solution based on Deep Learning.The Marker-based method uses a printable cylindrical marker and the EPnP algorithm to compute 6D pose from 2D–3D correspondences. It was applied in both simulated and real scenarios, delivering robust and accurate results. In addition to serving as a stand-alone solution, it also provided ground truth (GT) annotations to evaluate the learning-based approach.The Marker-Less method is based on FoundationPose, a framework that compares a cropped RGB image of the tool with its CAD model to regress its 6D pose. It uses two networks: a pose refinement module that generates pose candidates from image-model alignment, and a pose selection module that ranks and selects the best hypothesis.In this study three dataset were used: two simulated ones (one per tool—Needle Driver and Cadiere Forceps) and a real-world one. In simulation, 150 frames per tool were generated using Unity, with the application of realistic textures and anatomical backgrounds derived from real surgical scenarios. GT was obtained via the Marker-based method. Unity also provided FoundationPose requirements, including segmentation masks, absolute depth maps, and camera intrinsics.The real-world dataset consists of 150 frames of the Needle Driver, acquired in a laboratory setting, using the dVRK system. GT was again computed using the Marker-based approach, with a printed marker mounted along the tool’s shaft. Since automated generation of segmentation and depth data is not available in this setting, masks were manually created using Roboflow, while depth maps were generated using Depth Anything and then converted into absolute values.Results show that the Marker-based method provided consistent and accurate pose estimates in both domains, serving as a solid reference throughout the study. FoundationPose showed promising performance in simulation, with rotational estimates corresponding to cosine similarity values between 0.83–0.87 (~9–21° rotational errors), and positional errors of 1–1.5 cm on the x and y axes. Depth estimation was less accurate, with errors of 6–7 cm. In real settings, performance declined moderately with a cosine similarity of 0.78–0.81 (~21–27° rotational errors), positional errors increasing to 2–3 cm and depth errors of 16 cm, likely due to the approximated depth data. In conclusion, the Marker-based method proved to be an effective solution for both pose estimation and GT annotation. FoundationPose showed potential for Marker-Less estimation but also revealed limitations, especially in real-world use. The small size and fine structure of the instruments increase sensitivity to error, while symmetric geometries can lead to rotational ambiguities. Nevertheless, targeted fine-tuning on domain-specific data could improve performance.

Relatori: Kristen Mariko Meiburger
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 86
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/36209
Modifica (riservato agli operatori) Modifica (riservato agli operatori)