Analysis of the impact of language and context in prompts on synthetic data generation with Large Language models

Gioele Giachino

Analysis of the impact of language and context in prompts on synthetic data generation with Large Language models.

Rel. Antonio Vetro', Marco Rondina. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Abstract

The increasing use of Large Language Models (LLMs) in various domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased decisions or patterns. With a focus on gender and professional bias, this thesis examines in which manner LLMs shape responses to ambiguous prompts, contributing to biased dynamics. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs’ ability to generate objective text in non-English languages.

Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT and Google Gemini