polito.it
Politecnico di Torino (logo)

An Approximate Graph-Similarity Algorithm based on Monte Carlo Tree Search

Mattia De Prisco

An Approximate Graph-Similarity Algorithm based on Monte Carlo Tree Search.

Rel. Giovanni Squillero, Stefano Quer. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2019

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial Share Alike.

Download (1MB) | Preview
[img] Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial Share Alike.

Download (170kB)
Abstract:

Graphs are a general and powerful data structure that can model a large variety of concepts. Finding the similarity between two graphs and defining a measure of similarity between a set of graphs are important topics that can have various applications: malware detection, model checking, computer vision, and many more. In this work, the similarity between graphs will be evaluated computing the Maximum Common Subgraph (MCS) both using an exhaustive algorithm, which compares two graphs, and both using an algorithm inspired by a general search algorithm which can instead compute the approximate similarity between a set of graphs. To compute the similarity between graphs, it is necessary to define the equivalence between vertices: given two graphs A and B, a vertex x belonging to A is equivalent to a vertex y belonging to B, if two conditions are met: they are of the same type (which is defined by the specific problem), and if some of x’s parents and/or children vertices belong to some equivalence class, also y must have matching parents and/or children in the same equivalence classes. This definition has been extended to an equivalence between n vertices of n different graphs. State of art exhaustive approaches generally become inapplicable as the dimensions of the graphs grow to few hundreds of vertices. This work addresses this problem by using an algorithm inspired by Monte Carlo tree search (MCTS), which is used to calculate the most promising next move in a decision-making problem, combining the precision of tree search with the generality of random simulation. MCTS is a general algorithm, i.e. it can be applied independently from the particular application, the only requirements being that the particular application can be represented as a tree and that a value can be estimated on its nodes. In the proposed algorithm the hyper partitions technique was conceived to help with the selection of equivalences: vertices of each graph that possess the same features are grouped into separate partitions, and then partitions of different graphs that are characterised by the same features are grouped again into another broader partition (hence the name of the technique). This results in being able to choose one random hyper partition, picking one node for each inner partition (i.e. one node for each input graph) and always obtaining a valid equivalence. Each hyper partition is characterised by the partition "hope", a value that represents how many equivalences are likely to be found on that hyper partition. A node of the search tree represents a state of the game, and it contains a component that manages the various hyper partitions, and a set with all the chosen equivalences till that point. Several values are defined on a node: "hope", which, like for the hyper partitions, is an estimate of how many equivalences can be obtained from that node, "fact", which estimates the value of the already obtained equivalences, and "weight", a combination of the two previous values. The results showed that this technique was able to perform better than the exhaustive algorithm as the dimensions of the graphs reach the thousands of vertices. Moreover, several aspects of the algorithm that can further improve it have already been identified, and they will be implemented in future works.

Relatori: Giovanni Squillero, Stefano Quer
Anno accademico: 2018/19
Tipo di pubblicazione: Elettronica
Numero di pagine: 67
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/11529
Modifica (riservato agli operatori) Modifica (riservato agli operatori)