Politecnico di Torino (logo)

Equitable Data Evaluation in Graph Machine Learning

Francesco Paolo Nerini

Equitable Data Evaluation in Graph Machine Learning.

Rel. Luca Dall'Asta, Paolo Bajardi, Andre' Panisson. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Data are becoming every day a more and more central economic asset for companies and public institutions. For this reason, there is an increasing need to provide an equitable evaluation of data. Therefore, having the ability of quantifying the value of data is of paramount importance to take decisions about potential data sharing policies among multiple parties. This is a particularly important question to answer in the case the different parties are competitors, and especially if the nature of the data is private and their transfer is therefore strongly regulated. A solution to the problem of an equitable evaluation is given by the Shapley value, a concept solution from the field of the Cooperative Game Theory. In this work, we apply this framework in the context of Graph Machine Learning. Many different social, biological and economical systems are naturally described by graphs and networks; this is the reason why in the last years there have been a development of Machine Learning frameworks which are designed to specifically deal with data structured as a graph and the subtleties they introduce. In these settings, as we explore in this work, the difference in value between different datasets can be given not only by the features of the data themselves, but also by the knowledge of the relations between datapoints. First, we show that relying only on the topological features of the subgraph owned by a single party is not enough to fairly estimate the value of individual contributors. Secondly, we highlight that depending on the final purpose of the pooled model, that translates in different testing procedures, the data value could change drastically. Finally, we investigate the possibility to estimate a subgraph value using only a smaller fraction of the entire subset. This could be particularly useful in the case of a first, exploratory tentative from different parties to try to share only a part of their data to train a common model, before committing to a complete collaboration.

Relators: Luca Dall'Asta, Paolo Bajardi, Andre' Panisson
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 45
Corso di laurea: Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Aziende collaboratrici: CENTAI INSTITUTE S.P.A. CENTAI S.P.A.
URI: http://webthesis.biblio.polito.it/id/eprint/24523
Modify record (reserved for operators) Modify record (reserved for operators)