Erfan Bayat
Adversarial RAG-based approach to counter-narrative generation.
Rel. Luca Cagliero, Aurora Gensale. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025
|
|
PDF (Tesi_di_laurea)
- Tesi
Accesso limitato a: Solo utenti staff fino al 24 Ottobre 2026 (data di embargo). Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) |
Abstract
The widespread dissemination of hate speech across digital platforms has encouraged the development of LLM-based counter-narrative defenses. Existing approaches face critical limitations in adaptability, evidence integration, and transparency. While the use of human-generated datasets and fine-tuning approaches has paved the way in recent studies, these methods struggle with evolving hate patterns, require costly retraining, and often produce generic responses that lack the evidence-based reasoning necessary to effectively counter sophisticated, harmful arguments. Moreover, exposing human annotators to hate speech during dataset creation raises significant ethical concerns and introduces systematic biases. This thesis presents a novel adversarial debate framework in which LLM agents assume opposing roles, with one defending problematic positions and the other developing evidence-based counterarguments through structured eight-turn debates.
This process enables the generation of more sophisticated hate and counter-hate speech that surpasses the current state-of-the-art datasets
Tipo di pubblicazione
URI
![]() |
Modifica (riservato agli operatori) |
