AI Security Assessment: Attacks and Defenses on Large Language Models

Roberto Di Ciaula

AI Security Assessment: Attacks and Defenses on Large Language Models.

Rel. Guido Marchetto, Alessio Sacco. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Abstract

The thesis activity aimed to provide an extensive overview of Large Language Models (LLMs), their usage in companies, and the associated vulnerabilities and security needs, emphasizing frameworks like MITRE, NIST, and OWASP's Top 10 LLM vulnerabilities. We started with an introduction to LLM architectures, including transformers, and discussed state-of-the-art techniques such as fine-tuning, reinforcement learning, retrieval-augmented generation (RAG), LLM agents, and prompt engineering. Then, we highlight how these technologies are widely used in companies utilizing LLMs. Key vulnerabilities are examined, with detailed examples such as prompt injection attacks, the widely used vector and unsafe output handling. To study vulnerabilities and frameworks, attacks on major public LLMs like GPT were conducted or existing ones were analyzed, providing insights into real-world implications and security measures.

Defensive strategies and mitigation tools like Garak, LlamaGuard, and LLM Guard are evaluated and compared