Explaining bias in modern Deep Language Models
Christian Vincenzo Traina
Explaining bias in modern Deep Language Models.
Rel. Elena Maria Baralis, Giuseppe Attanasio. Politecnico di Torino, Master of science program in Computer Engineering, 2022
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial. Download (7MB) | Preview |
Abstract
In recent years, episodes of hate speech on the Internet have increased. Hate speech manifests with instances of misogyny, racism, and attacks on minorities. To analyze large amounts of data and curb the spread of hurtful content, modern language models such as BERT are currently employed in the task of automatic hate speech detection. Although these models have outperformed previous solutions, several recent works have shown that they still suffer from unintended bias. Biased models tend to be over-sensitive to a limited set of words, so they base the entire decision on only those words and ignore the context. Much recent work has focused on explaining the models, on the understanding of the output, and the way it is obtained.
Explanation methods can be based on either exploiting the inner workings of the neural network or analyzing the output by perturbing the input
Relators
Publication type
URI
![]() |
Modify record (reserved for operators) |
