Methods to Assess the FR Government’s Current Role as a Data Provider for AI

Cristian Degni

Methods to Assess the FR Government’s Current Role as a Data Provider for AI.

Rel. Paolo Garza, Rapahel Troncy. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2026

PDF (Tesi_di_laurea) - Tesi
Accesso limitato a: Solo utenti staff fino al 27 Settembre 2027 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB)

Abstract

Governments publish high-quality institutional data and content (service portals, statistical definitions, numerical indicators), but Large Language Models (LLMs) training corpora are largely opaque. Consequently, it is unclear whether and to what extent these sources actually contribute to the behavior of the models, nor through what observable signals their impact can be rigorously assessed. In this thesis we study the role of French institutional sources as data providers for open-weight LLM by replicating and adapting a two-component evaluation framework to the French context. The first test is leakage-oriented and measures behavioral recall signals of (i) correct numerical values and (ii) verifiable provenance/attribution, in the absence of external retrieval and under controlled prompting.

The second test is intervention-oriented and interprets targeted unlearning as ablation, comparing baseline models and unlearned variants on target and safe queries to estimate a selective effect on targets and the associated trade-off with overall utility