A Synthetic Data Generation Approach for Subgroup-Based Bias Mitigation in Structured Data

Maria Antonietta Longo

A Synthetic Data Generation Approach for Subgroup-Based Bias Mitigation in Structured Data.

Rel. Eliana Pastor, Flavio Giobergia. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB) | Preview

Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (14MB)

Abstract:	Nowadays, it is increasingly common to entrust decisions to Artificial Intelligence through Machine Learning algorithms, especially in fields such as medical diagnosis, social networks, smart cities, and finance. Since these decisions directly impact people, it is essential to assess their reliability and trustworthiness. Accuracy provides an indication of a model's performance but is insufficient to determine how much one can truly rely on its predictions. A key issue is that models depend on data, which is often unevenly represented, potentially leading to unfair predictions that disproportionately affect smaller or less represented populations. This phenomenon, known as Representation Bias, arises when the sample used for model development does not adequately capture certain segments of the population, resulting in poor generalization for those groups. When a model systematically misclassifies specific feature value pairs, problematic subgroups, it exhibits bias against the affected populations. Existing bias mitigation methods for tabular data often require prior knowledge of biases rather than identifying them automatically, which may be limiting when misclassifications stem from complex social contexts. Additionally, some approaches rely on a held-out dataset, which is not always available. This thesis proposes a new model-agnostic bias mitigation method for tabular data, which uses an algorithm for the automatic identification of problematic subgroups and generates new representative data using an interpolation model. This improves model predictions for instances containing problematic subgroups and, most importantly, enhances fairness.
Relatori:	Eliana Pastor, Flavio Giobergia
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	112
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/34725

Modifica (riservato agli operatori)