polito.it
Politecnico di Torino (logo)

Designing and engineering LLM techniques for detecting novel bash attacks

Alessandro Redi

Designing and engineering LLM techniques for detecting novel bash attacks.

Rel. Marco Mellia, Luca Vassio, Matteo Boffa. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

In recent years, command-line interface (CLI) commands have become more articulated and increased in complexity. This has underscored the possible need for advanced tools that can efficiently analyze and understand this data. In this thesis, Bash is the language taken into account. Command-line interactions, which are integral to system administration, software development, and data processing, often involve sequences of commands, from now on called sessions, and their corresponding outputs that encode rich semantic details. However, capturing and leveraging this information for tasks such as session similarity measurement, anomaly detection, and command category classification, can present a significant challenge especially when an attacker exploits noisy or obfuscated sessions. Indeed one of the main objectives of this thesis is to find the answer to the following question: Is it really needed to understand the semantic meaning of a Bash session or its syntax analysis is enough for the previously said tasks? The thesis introduces a comparison between different methods applied for the resolution of the novelty detection problem. As a baseline, we started from deterministic and simple ones like Tf-Idf, until complex ones like LLMs. The baseline is used to exploit a syntactic approach, instead, the LLMs, are used to obtain semantic knowledge through contrastive loss training on Bash session datasets. In the end, the results will show that complex models will learn more than the basic syntax, achieving good results in tasks with few sessions, while having bigger problems when the datasets to analyze increase in numbers.

Relatori: Marco Mellia, Luca Vassio, Matteo Boffa
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 65
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI
Aziende collaboratrici: Politecnico di Torino- SmartData@PoliTo
URI: http://webthesis.biblio.polito.it/id/eprint/33209
Modifica (riservato agli operatori) Modifica (riservato agli operatori)