Mask2KAN: A Universal Image Segmentation Kolmogorov–Arnold Network Architecture

Gianluca Guzzetta

Mask2KAN: A Universal Image Segmentation Kolmogorov–Arnold Network Architecture.

Rel. Carlo Masone, Shyam Nandan Rai. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (8MB) | Preview

Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (84MB)

Abstract:	Universal architectures like Mask2Former have redefined the way we approach image segmentation tasks. Traditionally, specialized architectures were used for specific tasks such as semantic, instance, and panoptic segmentation. Now, a single, unified architecture can outperform these task-specific models, offering benefits in performance, efficiency, and effort, while also reshaping the way we perceive these tasks. In this paper, experiments are conducted using the Mask2Former configuration for \textit{semantic segmentation}. However, similar to other universal models like DETR, these architectures, despite sharing the same underlying structure, \textit{are still trained separately for different tasks and datasets}. Recent works on the passage from the Universal Approximation Theorem to a Kolmogorov-Arnold theorem inspired the present work to delve in Kolmogorov Arnold Network on computer vision tasks. Traditional semantic segmentation models as Mask2Former, recognize a predefined set of classes, often failing to detect unseen objects (anomalies). To address this, we propose Mask2KAN, a novel approach derived from the Mask2Former architecture , which shifts from a per-pixel (i.e. BERT) to a mask classification (i.e. Mask2Former) focusing on reducing ood anomalies (i.e. Mask2Anomaly), with an efficient Kolmogorov-Arnold Network (KAN) mask embed prediction head, hence improving the segmentation of unseen objects and reducing false positives. Proposed architectures include ResNet-50 and Swin-T/S/B/L as backbones. and using KAN mask embed layers sets a new state-of-the-art in anomaly segmentation, since our approach demonstrates superior performance across various benchmarks on semantic segmentation, making it a robust solution also for real-world scenarios as autonomous driving applications or anomaly detection in the wild. For more details and code, visit our \href{https://github.com/gguzzy/benchmark}{Github page}.
Relatori:	Carlo Masone, Shyam Nandan Rai
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	47
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/33176

Modifica (riservato agli operatori)