Graph Neural Networks for Relational Databases Analysis

Andrea Mirenda

Graph Neural Networks for Relational Databases Analysis.

Rel. Paolo Garza, Luca Colomba, Daniele Loiacono. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB) | Preview

Abstract:	Relational databases are the backbone of modern data infrastructure, supporting much of the digital economy. Despite their importance, their rich relational information is often overlooked. Most predictive pipelines, in fact, still flatten relational schemas into a single table, discarding higher-order relational structure, cross-table dependencies, and forcing reliance on costly and fragile feature engineering, sensitive to expert skill. This thesis embraces a graph native learning alternative: we cast relational schemas into heterogeneous temporal graphs. Each table in the relational schema becomes a node type, rows become nodes and foreign keys become typed edges. Some node types are associated with time attributes, representing the timestamp at which a node appears. Crucially, the graph construction is schema agnostic and automatic: given any relational database, we derive its heterogeneous temporal graph and train a single pipeline for node level regression and classification tasks. Guided by the goal of efficient and transparent prediction for heterogeneous temporal graphs, this thesis advances the field in three complementary ways: (i) we evaluate self-supervised pre-training strategies tailored to heterogeneous temporal graphs; (ii) we conduct a systematic exploration of graph-model architectures, training regimes, and design choices to surface robust configurations; (iii) we introduce XMetaPath, a self-explainable GNN that aggregates information over a compact set of X meta-paths and provides faithful explanations for each prediction. On top of this, we conduct a systematic study of meta-path selection and implement three methods that automatically discover meta-paths: greedy based, LLM guided scoring, and a reinforcement learning agent that leverages model feedback to prioritize task relevant relational patterns. We evaluate these contributions on RelBench, which is a benchmark of realistic multi-table relational datasets with standardized tasks and splits, and we achieve competitive results while introducing a self-explainable model that provides transparent reasoning. Taken together, our experiments yield two distinct takeaways. First, self-supervised pre-training offers reliable gains in this setting. Second, a self-explainable, meta-path–based model provides transparent rationales while matching, and sometimes surpassing, the predictive power of strong non interpretable baselines.
Relatori:	Paolo Garza, Luca Colomba, Daniele Loiacono
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	156
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/37707

Modifica (riservato agli operatori)