Politecnico di Torino (logo)

Dynamic identification of risk thresholds for balance measures in machine learning

Andrea Adrignola

Dynamic identification of risk thresholds for balance measures in machine learning.

Rel. Antonio Vetro', Mariachiara Mecati. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview

Automated decision-making systems (ADM) may significantly affect our everyday life. They can assist us in a number of tasks when used as a reference, or even substitute humans entirely, and they are as much an opportunity to challenge our decision-making processes as they are a mean of reinforcing pre-existing biases. Because even if algorithms are mostly neutral, the data used to train them could (and usually do) encode social biases. For this reason, one of the main approaches to mitigate bias in such a framework is to work on data quality. To assess how the quality of the data affects the outcome of a classification, we made use of two different set of indices, balance measures and fairness measures, relating to different stages of a machine learning pipeline. Balance measures assess the proportions of classes of a given sensitive attribute (training set), while fairness measures evaluate the fairness of the outcome, in our case a classification (test set), with respect to the same attribute. In our study we take into account both binary and multiclass attributes. The aim of the study was to evaluate the feasability of thresholds for both balance and fairness measures, such that if the balance is over its threshold, then we can be assume that also the fairness is over its threshold (in our case we compute the unfairness, so we want it to be under a certain value). In other words, we want to anticipate an incoming bias, estimating the fairness of the classification looking at the balance of the data. To obtain a sufficient amount of instances, we created a large amount of synthetic versions of numerous datasets, with different levels of balance to see how it would affect the fairness of the outcome. To further generalize the study, we included different algorithms. We created thresholds separately for each combination of balance-fairness-algorithm, taking into account not only the point of view on how balance and fairness should be evaluated (different measures encode different points of view), but also the specific way data are processed. To measure the goodness of the thesholds, we selected a variety of sensitivity measures.

Relators: Antonio Vetro', Mariachiara Mecati
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 63
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/23512
Modify record (reserved for operators) Modify record (reserved for operators)