Politecnico di Torino (logo)

Graph Neural Networks, Multi-Threading and Heuristics for the Computation of the Maximum Common Subgraph

Salvatore Licata

Graph Neural Networks, Multi-Threading and Heuristics for the Computation of the Maximum Common Subgraph.

Rel. Stefano Quer, Giovanni Squillero, Andrea Calabrese. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

In recent years, we have seen a rising usage of graph-like data structures in several fields, with a wide range of applications in real-world scenarios. Various problems known by the computer science community for a long time have come back to the surface, backed up by new ideas and algorithms to challenge their commonly high degree of complexity. The Maximum Common Subgraph problem is one of them, as it is a well-known computational problem in graph theory. The MCS of the two graphs is the largest common subgraph between them, and it has great relevance in numerous fields, ranging from computer vision to malware detection and bioinformatics. Unfortunately, this problem is NP complex, making it difficult to solve in a reasonable amount of time. Many researchers have tried to develop novel approaches to explore the enormous search space generated on average by this problem. Since 2016, a new algorithm, McSplit, has become one of the most commonly adopted baselines to face this task. McSplit is a recursive procedure that exploits a branch-and-bound technique to prune the search space aggressively when the current evaluation tree is upper-bounded. Each recursion step chooses which vertex pair from the two graphs has to be added to the current solution through the score returned by a heuristic function. This strategy laid the foundation for many subsequent works, which mainly aimed at enhancing it by using more efficient heuristics. Other approaches have also leveraged novel Deep Learning methods, like Graph Neural Networks, exploiting big data to train models to find deep representations inside graphs. This work explores three different techniques to extend the current state-of-the-art McSplit variant, called McSplitDAL. In the first one, we analyze the most recent variants of McSplit: McSplitRL, McSplitLL, and McSplitDAL. These algorithms maintain the same core as the original one but substitute the scoring heuristic with a table of rewards, accurately updated as in classical Reinforcement Learning frameworks. We also introduce new heuristics, which proved to lead to better solutions than the out-of-the-box variants of these algorithms. In the second one, we focus on exploiting parallelization. We build an iterative version of the McSplit algorithm to optimize the load distribution among different threads and allow for a faster and broader search space exploration. Additionally, we introduce an enhancement of an existing parallel version of McSplit using our heuristics, which provides benefits with almost no additional overhead. In the last one, we present techniques based on Graph Neural Networks. This methodology differs completely from the previous ones, and it has been inspired by the recent growing interest in GNNs. We propose two different GNN-based trainable architectures to select the vertices to be added to the current MCS at each algorithm step. Concerning results, the first approach improves the current state-of-the-art on single-thread execution, and it has been accepted at ICSOFT 2023 (18th International Conference on Software Technologies) as a full paper. The parallel strategies outperform the single-thread algorithm, proving the benefits of parallelization. Finally, the GNN-based technique obtains good results, and it is able to process larger graphs. Among the proposed methods, the last one has the most significant chance of improvement: accurate finetuning and a more complex architecture may provide additional gains.

Relators: Stefano Quer, Giovanni Squillero, Andrea Calabrese
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 81
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/27709
Modify record (reserved for operators) Modify record (reserved for operators)