polito.it
Politecnico di Torino (logo)

Graph Data Science and machine learning applications

Antonella Cardillo

Graph Data Science and machine learning applications.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (11MB) | Preview
Abstract:

Connectivity is the most widespread feature of today’s networks and systems. From protein interactions to social networks as Facebook or LinkedIn, from communication systems to electrical or power networks, and from economic grids as marketing or user bank systems to networks of neurons with even a moderate degree of complexity are not casual, which means it is not possible to assume any statistical distribution about connections of the networks mentioned above also because these are not static. Classical statistical analysis would be able neither to describe nor to predict behaviors within connected systems. As data becomes increasingly interconnected and systems increasingly sophisticated and complex, it is essential to make use of the rich and evolving relationships within our data, also using technologies built to leverage relationships and their dynamic nature. Graphs are powerful structures useful not only for modeling connected information, but also for supporting multiple types of analysis. Graph Data Science (GDS) is a graph-data driven approach to gain knowledge from the relationships in data using a sophisticated set of queries and algorithms specifically designed to uncover powerful insights in graph data, focusing on the interactions between entities. Graph powered machine learning is a GDS area consisting of the application of graph data and analytics results to train machine learning models. To empower our machine learning workflow through graphs, we have to be able to store, access and handle these structures efficiently using technologies built to leverage relationships between data when real systems become increasingly complex and hence data increasingly interconnected. The theoretical core of this work consists of the study of a general-purpose data management technology called graph database: from finance to healthcare or logistics, the relevant aspect for which graph databases are designed is to represent relationships in order to give links the same importance as data itself and navigate them in an efficient way. After exploring the theoretical ideas and the working behind the main graph algorithms with some more representative use cases, the question will be: how could we use graph algorithms to improve the learning phase of a machine learning workflow? Connected feature extraction is the most practical way to start improving ML predictions using graph algorithms. Putting together the right mix of connected features into our machine learning model can increase performance because it essentially influences how our models learn. We will analyze three different use cases, the Link Prediction problem, the Graph Recommender system and an Antifraud model with graphs in order to show how graphs can offer an original and efficient solution in terms of graph feature extraction, data modeling and computational efficiency of the used algorithms compared to classical machine learning techniques.

Relatori: Paolo Garza
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 83
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici: DATA Reply S.r.l. con Unico Socio
URI: http://webthesis.biblio.polito.it/id/eprint/32518
Modifica (riservato agli operatori) Modifica (riservato agli operatori)