Gianluca Guzzetta
Mask2KAN: A Universal Image Segmentation Kolmogorov–Arnold Network Architecture.
Rel. Carlo Masone, Shyam Nandan Rai. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (8MB) | Preview |
|
Archive (ZIP) (Documenti_allegati)
- Other
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (84MB) |
Abstract: |
Universal architectures like Mask2Former have redefined the way we approach image segmentation tasks. Traditionally, specialized architectures were used for specific tasks such as semantic, instance, and panoptic segmentation. Now, a single, unified architecture can outperform these task-specific models, offering benefits in performance, efficiency, and effort, while also reshaping the way we perceive these tasks. In this paper, experiments are conducted using the Mask2Former configuration for \textit{semantic segmentation}. However, similar to other universal models like DETR, these architectures, despite sharing the same underlying structure, \textit{are still trained separately for different tasks and datasets}. Recent works on the passage from the Universal Approximation Theorem to a Kolmogorov-Arnold theorem inspired the present work to delve in Kolmogorov Arnold Network on computer vision tasks. Traditional semantic segmentation models as Mask2Former, recognize a predefined set of classes, often failing to detect unseen objects (anomalies). To address this, we propose Mask2KAN, a novel approach derived from the Mask2Former architecture , which shifts from a per-pixel (i.e. BERT) to a mask classification (i.e. Mask2Former) focusing on reducing ood anomalies (i.e. Mask2Anomaly), with an efficient Kolmogorov-Arnold Network (KAN) mask embed prediction head, hence improving the segmentation of unseen objects and reducing false positives. Proposed architectures include ResNet-50 and Swin-T/S/B/L as backbones. and using KAN mask embed layers sets a new state-of-the-art in anomaly segmentation, since our approach demonstrates superior performance across various benchmarks on semantic segmentation, making it a robust solution also for real-world scenarios as autonomous driving applications or anomaly detection in the wild. For more details and code, visit our \href{https://github.com/gguzzy/benchmark}{Github page}. |
---|---|
Relators: | Carlo Masone, Shyam Nandan Rai |
Academic year: | 2024/25 |
Publication type: | Electronic |
Number of Pages: | 47 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | Politecnico di Torino |
URI: | http://webthesis.biblio.polito.it/id/eprint/33176 |
Modify record (reserved for operators) |