Gabriele Esposito
LLMs in the SIEM Loop: A Contract-Based Framework for Threat Detection with an Evaluation on Windows Telemetry and MITRE ATT&CK Mapping.
Rel. Andrea Atzeni. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (6MB) |
| Abstract: |
Security Information and Event Management (SIEM) platforms centralize and correlate heterogeneous telemetry to surface suspicious behavior. Yet a stubborn gap remains between raw events and analyst-ready claims about what actually happened—claims that align with operational abstractions such as MITRE ATT&CK techniques. Large language models (LLMs) are a natural candidate for this semantic bridge: they read unstructured text well and can map descriptions to controlled vocabularies. However, the usual way LLMs are applied—open prompts, long contexts, free-form outputs—sits uneasily with security operations. Hallucinated details, brittle formatting, unclear provenance, and privacy constraints make naïve integration impractical. This thesis asks a practical question: how can a SIEM pipeline employ LLMs to transform telemetry into attack-informed, auditable artifacts under constraints of accuracy, privacy, and governance? Rather than proposing a single “LLM for security,” the thesis advances a modular architecture in which multiple LLM operators—potentially different models with different inductive biases—are composed under narrow contracts and surrounded by validation, retrieval, and feedback. Each operator performs one disciplined transformation of evidence (e.g., condense noisy events; map a behavior to ATT&CK; justify a claim), and each speaks a constrained interface, so that downstream components can enforce schema, check consistency, and keep provenance. The design goal is not maximum model power, but governability: the ability to reason about, audit, and evolve the system as models change. To make this design concrete without overreaching, the thesis exercises one thin, end-to-end path through the architecture. A disciplined Windows lab executes one Atomic Red Team technique per run; Sysmon and selected Windows channels are exported to XML to retain structured fields; a reporter operator compresses each run into one neutral sentence in technical English; a mapper operator outputs only a list of ATT&CK IDs as a plain string. The reporter is instantiated with GPT-4o; the mapper is instantiated as a compact, locally deployable Mistral-7B adapted with LoRA. GPT-4o is also used as a comparator mapper under the same “IDs-only” instruction. The work is intentionally scoped. It does not build correlation rules nor a full SIEM; it designs and evaluates LLM operators that fit inside a SIEM pipeline. Outputs are SIEM-ready: one sentence and a set of technique IDs are exactly the kind of governed artifact a validator can wrap into a Detection Record and forward to an alert store, where existing rules can corroborate, contextualize. The thesis also records real-world constraints: a small CTI training corpus (especially for rare techniques), a hosted reporter in the lab that a production SOC would replace with on-prem/VPC inference or sanitized inputs, and no retrieval/validation/feedback in the prototype path. These are not oversights; they are transparent boundaries that convert a sprawling problem into a tractable experiment. Overall, the thesis contributes: a survey that situates LLMs across reliability, DFIR/log analysis, and ATT&CK-aligned SIEM practice; a governed design framework with narrow contracts and explicit guardrails; and a working instantiation with reproducible procedures and transparent limits. |
|---|---|
| Relatori: | Andrea Atzeni |
| Anno accademico: | 2025/26 |
| Tipo di pubblicazione: | Elettronica |
| Numero di pagine: | 113 |
| Soggetti: | |
| Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
| Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
| Aziende collaboratrici: | Politecnico di Torino |
| URI: | http://webthesis.biblio.polito.it/id/eprint/37688 |
![]() |
Modifica (riservato agli operatori) |



Licenza Creative Commons - Attribuzione 3.0 Italia