polito.it
Politecnico di Torino (logo)

Mask2KAN: A Universal Image Segmentation Kolmogorov–Arnold Network Architecture

Gianluca Guzzetta

Mask2KAN: A Universal Image Segmentation Kolmogorov–Arnold Network Architecture.

Rel. Carlo Masone, Shyam Nandan Rai. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (8MB) | Preview
[img] Archive (ZIP) (Documenti_allegati) - Other
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (84MB)
Abstract:

Universal architectures like Mask2Former have redefined the way we approach image segmentation tasks. Traditionally, specialized architectures were used for specific tasks such as semantic, instance, and panoptic segmentation. Now, a single, unified architecture can outperform these task-specific models, offering benefits in performance, efficiency, and effort, while also reshaping the way we perceive these tasks. In this paper, experiments are conducted using the Mask2Former configuration for \textit{semantic segmentation}. However, similar to other universal models like DETR, these architectures, despite sharing the same underlying structure, \textit{are still trained separately for different tasks and datasets}. Recent works on the passage from the Universal Approximation Theorem to a Kolmogorov-Arnold theorem inspired the present work to delve in Kolmogorov Arnold Network on computer vision tasks. Traditional semantic segmentation models as Mask2Former, recognize a predefined set of classes, often failing to detect unseen objects (anomalies). To address this, we propose Mask2KAN, a novel approach derived from the Mask2Former architecture , which shifts from a per-pixel (i.e. BERT) to a mask classification (i.e. Mask2Former) focusing on reducing ood anomalies (i.e. Mask2Anomaly), with an efficient Kolmogorov-Arnold Network (KAN) mask embed prediction head, hence improving the segmentation of unseen objects and reducing false positives. Proposed architectures include ResNet-50 and Swin-T/S/B/L as backbones. and using KAN mask embed layers sets a new state-of-the-art in anomaly segmentation, since our approach demonstrates superior performance across various benchmarks on semantic segmentation, making it a robust solution also for real-world scenarios as autonomous driving applications or anomaly detection in the wild. For more details and code, visit our \href{https://github.com/gguzzy/benchmark}{Github page}.

Relators: Carlo Masone, Shyam Nandan Rai
Academic year: 2024/25
Publication type: Electronic
Number of Pages: 47
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/33176
Modify record (reserved for operators) Modify record (reserved for operators)