Politecnico di Torino (logo)

Federated learning for network traffic analysis

Kai Huang

Federated learning for network traffic analysis.

Rel. Marco Mellia, Luca Vassio. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Darknets are formed by ranges of IP-addresses that do not host services. Darknets constantly receive and record unsolicited traffic, making them valuable instruments to characterize and detect Internet-wide events such as the spreading of new malware, network scans and misconfigurations. Darknets observe scanning activities from thousands of sources, analyzing darknet traffic and detecting coordinated activities can provide meaningful information for network security analysis to detect cyber threats and to counter them more effectively. Methods like DarkVec, inspired from Natural Language Processing to utilize word embedding Word2Vec for darknet traffic analysis, can extract meaningful insight from large amounts of data to learn representations of activities associated with IP addresses. IP embeddings generated by DarkVec can provide useful insight into coordinated activities but can only provide a limited view since they are built on a single network. To overcome this limitation and obtain a general overview, common representations need to be created from different networks. The huge volume of darknet traffic makes sharing row data from different darknets impractical, so approaches to expand knowledge also should avoid sharing data. Given the lack of comprehensive ground truth available for learning activity patterns in darknet traffic, unsupervised clustering techniques are applied to identify source IP addresses that act in similar patterns. Automatic detection of changes in activities in darknets is crucial to unveiling and mitigating potential cyber threats, making it important to track the evolution of clusters. This work focuses on two aspects: i) leveraging the Federated Learning method to build common representations for source IP addresses observed in different darknets; ii) monitoring evolution of clusters over time to detect temporal changes in darknet. I develop a federated learning approach for Word2Vec algorithm to learn common representations collaboratively from different darknets. I design a new strategy to aggregate models with different dimensions, make it scalable to multiple networks, perform in-depth analysis in different scenarios to stress the performance. The quality of the common representations is tested and analyzed using real-world data, utilizing domain knowledge of IP addresses belonging to well-known Internet scanners as ground truth classes to provide reasonable validation. Federated learning does improve the quality of representations, for example when two /24 darknets learn collaboratively, weighted F1 score improves from 0.83 in the local training scene to 0.88, the coverage of each participant also increases with about 11,000 more IP addresses. To unveil the evolution of clusters, I employ and adjust new metrics for tracking transitions in clusters over time to detect changes in the whole clustering rather than only one cluster. Conventional metrics like silhouette and adjusted Rand Index cannot provide enough insight to detect cluster evolution, but the new method can identify changes like a new cluster emerging reflecting a new coordinated activity, does a cluster consist of existing or the behavior changes, which helps to identify and understand some events and activities. The results indicate learning representation collaboratively can extract more information and obtain a more general overview of coordinated activities and the method to monitor cluster evolution can provide meaningful insight to detect and analyze changes of those activities.

Relators: Marco Mellia, Luca Vassio
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 75
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: New organization > Master science > LM-27 - TELECOMMUNICATIONS ENGINEERING
Aziende collaboratrici: Politecnico di Torino- SmartData@PoliTo
URI: http://webthesis.biblio.polito.it/id/eprint/27782
Modify record (reserved for operators) Modify record (reserved for operators)