Politecnico di Torino (logo)

Towards fairness AI: A data-centric approach

Uditi Ojha

Towards fairness AI: A data-centric approach.

Rel. Antonio Vetro'. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview

Title: Towards fairness AI: A data-centric approach Subtitle: Assessment and mitigation techniques to tackle bias in datasets The consequences of bias and injustice have received more attention, even though AI is increasingly employed in delicate fields like health care, hiring, and criminal justice. We know that individual and social biases, which are frequently unconscious, affect and skew human decision-making in many ways. Although it might seem that using data to automate judgments would guarantee fairness, we now know that this is untrue. Societal bias can be incorporated into training datasets for AI, decisions made even during the machine learning development stage, and intricate feedback loops that form when a machine learning model is used in the real world. Under the guidance of Carmine D'Amico and Prof. Antonio Vetro, the thesis was carried out in Clearbox AI to understand data bias and methods for addressing it. As a result, we begin by developing a library for data bias assessment and comprehending several bias mitigation strategies. We aim to anticipate unfairness before applying any algorithm by studying the bias associated with protected attributes such as age, ethnicity, gender, education, marital status, etc. We created bias measure metrics that fall into three main categories Balance, Equality, and Distance. The Balance measure seeks to determine if a particular class of a protected attribute is balanced. The Equality measure metrics desire to determine if a specific class of a protected attribute is handled equally, and finally, Distance measure metrics aim to determine whether the protected attribute distribution is close to the target reference distribution. The study is carried out at different stages to evaluate how well the bias measure metrics perform on five chosen datasets from the social and financial domains. Synthetic data is generated by using AI algorithms instead of collecting data from real-world cases. It incorporates all the statistical and distribution properties of the original dataset. The use of synthetic data can improve AI and solve various data-related properties. In our study, we employed synthetic datasets to determine whether utilizing synthetic datasets could lessen data bias. Three distinct vendors provide the synthetic datasets ( Syndata, Mostly AI, Gretel ). We will evaluate the performance of our bias measure metrics on the synthetic datasets. The synthetic dataset that functions well is the one that can upsample the rare class of each protected attribute. The synthetic dataset should be able to duplicate the original dataset while also producing a new dataset more evenly distributed. Gretel typically performs better because it generates synthetic datasets with more balanced ratios for each class of a protected attribute. Pre-processing bias mitigation techniques are applied to a handful of the selected five datasets. The bias mitigation strategies are taken from two open source libraries: Synthesized SDK and AI Fairness 360 Toolkit (aif360). We will assess the performance of our bias measure metrics on the debiased datasets created by utilizing the various pre-processing bias mitigation techniques. Bias and injustice are ambiguous concepts without a clear definition. They heavily rely on the context in which they are used. Our goal should be to comprehend bias at the dataset levels and work to assess and reduce it as much as possible, utilizing various strategies and research methods.

Relators: Antonio Vetro'
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 113
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: ClearBox AI Solutions S.R.L.
URI: http://webthesis.biblio.polito.it/id/eprint/24666
Modify record (reserved for operators) Modify record (reserved for operators)