Jacopo Spaccatrosi
CLIP-MD: Causal and Low Latency Inference for Procedural Mistake Detection.
Rel. Francesca Pistilli, Giuseppe Bruno Averta, Gaetano Salvatore Falco. Politecnico di Torino, Master of science program in Computer Engineering, 2026
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial No Derivatives. Download (11MB) | Preview |
Abstract
Procedural Mistake Detection (PMD) aims to identify deviations from an expected multi-step workflow while a user is performing a task. In egocentric settings, this requires strictly causal inference, low latency, and robustness to visual noise such as occlusions and fine-grained hand–object interactions. In this thesis, PMD is formulated as a One-Class Classification problem: models are trained only on correct executions and must detect any deviation at inference time. We adopt a dual-branch online pipeline where (i) a step recognition branch performs Online Action Detection (OAD) and (ii) a step anticipation branch predicts the next expected step from the recognised history; a mistake is signalled when executed and expected steps disagree.
For online step recognition, we analyse lightweight recurrent baselines and agnostic multimodal LLM-based approaches, showing that the latter remain unsuitable for real-time OAD due to limited fine-grained accuracy and low throughput
Relators
Academic year
Publication type
Number of Pages
Course of studies
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modify record (reserved for operators) |
