Methods and Measures for bias detection in natural language processing: A study on word embeddings and masked models

Nicola Maddalozzo

Methods and Measures for bias detection in natural language processing: A study on word embeddings and masked models.

Rel. Eliana Pastor, Laura Alonso Alemany. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Abstract

The society in which we live is influenced by prejudices that discriminate against specific groups of the population. In recent years, the presence of these biases has been detected in the textual data used to train natural language processing algorithms. Thus, the tools based on these algorithms present biases that harm specific categories of people. In addition to causing harm to people affected by biases, these tools do not comply with the fundamental right to non-discrimination, which may result in legal action against the responsible companies and institutions that created them. To detect and characterize this type of bias in natural language processing tools, the scientific community has developed methods and metrics to detect and measure bias.

In this thesis, we apply these methods to analyze two different types of tools used in natural language processing