Exploring Odd-One-Out Anomaly Detection

Silvio Chito

Exploring Odd-One-Out Anomaly Detection.

Rel. Tatiana Tommasi, Paolo Rabino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (18MB) | Preview

Abstract:	The Odd-One-Out anomaly detection problem is an emerging research direction aimed at identifying visually distinct instances within multi-object 3D scenes. Unlike traditional anomaly detection tasks, which rely on predefined notions of normality or anomaly, the Odd-One-Out task is inherently contextual. An object's status as "anomalous" is determined not by global priors but by its dissimilarity to co-occurring objects in the same scene. This contextual framing introduces significant modeling challenges, requiring both spatial understanding of individual objects and relational reasoning to capture how objects relate to each other. This thesis explores how to learn context-aware object representations for 3D anomaly detection. We begin by reconstructing 3D voxel grids for each scene using multi-view 2D feature maps, which are backprojected using known camera intrinsics and extrinsics. We evaluate feature quality under different encoding strategies, including a baseline with ResNet50 and a distillation pipeline using DINOv2. Our experiments reveal that direct use of DINOv2 features outperforms distillation-based approaches and produces richer embeddings for downstream tasks, even if naive clustering on these features results insufficient for separating anomalies from normal objects. To enhance the discriminativeness of object features, we apply contrastive learning on compact voxel grid representations, exploring strategies such as synthetic hard negative generation, positive samples extrapolation and memory banks. Despite limited gains, these experiments highlight the complexity of scene-dependent anomaly separation. We then propose a more efficient alternative: compressing 3D object features into fixed-size embeddings and refining them with a Transformer-based architecture. Attention mechanisms help recalibrate these features within the context, while a novel Residual Anomaly Module introduces a learnable "normality centroid". This module allows the system to measure deviation from contextually-defined normality, effectively modeling scene-specific feature embeddings. Moreover, different 3D object detection models have been considered in order to detect instances inside the 3D scene. Three main paths have been addressed for solving this problem: exploiting 2D object detection models, along with multi-view geometrical considerations for recovering the instances 3D bounding boxes, exploiting naive 3D Object Detectors and Multi Modal Vision Language Models. We evaluate our framework on the ToysAD8K and Parts15K datasets, which handle multi-view 3D scenes. The proposed approach demonstrates strong potential for real-world applicability in automated inspection pipelines, in particular where objects are numerous, similar and subtly different when anomalous.
Relatori:	Tatiana Tommasi, Paolo Rabino
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	55
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/36369

Modifica (riservato agli operatori)