Autoencoder-Based Feature Extraction and Explainable Anomaly Detection in Network Security

Christian Colella

Autoencoder-Based Feature Extraction and Explainable Anomaly Detection in Network Security.

Rel. Alessio Sacco, Guido Marchetto. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (6MB) | Preview

Abstract:	The growing complexity and heterogeneity of modern network traffic pose a signif- icant challenge to anomaly detection in cybersecurity. Traditional models often fail to generalize across datasets with differing distributions and feature spaces, resulting in limited robustness when applied to unseen environments. This thesis proposes a unified framework for network anomaly detection that leverages multiple datasets to build a generalizable classification model. The proposed approach utilizes AutoEncoders (AEs) to transform multiple datasets into a common feature space, thereby enabling their integration. We train an independent AE on each dataset to learn a compact, latent representation of its specific traffic patterns (both normal and anomalous). Once trained, only the encoder portion of each AE is retained to map the data into its latent space. This process generates meaningful and comparable features across all datasets, neutralizing inconsistencies like different scaling or feature definitions. These encoded representations are then merged into a single, unified dataset. Finally, this combined dataset is used to train a Multi-Layer Perceptron (MLP) classifier to distinguish between benign and malicious traffic. The approach was evaluated using three benchmark datasets — CIC-IDS2017, BoT-IoT, and UNSW-NB15 — each representing distinct network conditions and types of attacks. Experimental results demonstrate high detection performance, achieving F1-scores of 96.1% on CIC-IDS2017, 99.9% on BoT-IoT, and 90.5% on UNSW-NB15, with an overall cross-dataset F1-score of 99.5%. These outcomes confirm the strong generalization capability of the proposed method and its ro- bustness across heterogeneous data sources. Finally, the SHAP (SHapley Additive exPlanations) framework was employed to interpret the model’s predictions, offer- ing insights into the most influential features and providing transparency in the decision-making process. Additionally, SHAP values were explored as a feature selection strategy to assess whether model performance could be improved. Overall, the results confirm that the unified representation provides a reliable and effective strategy for network anomaly detection in heterogeneous environments.
Relatori:	Alessio Sacco, Guido Marchetto
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	66
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/38601

Modifica (riservato agli operatori)