polito.it
Politecnico di Torino (logo)

A parallel algorithm for mining sequences of spatio-temporal co-location patterns

Luca Ferraro

A parallel algorithm for mining sequences of spatio-temporal co-location patterns.

Rel. Paolo Garza, Luca Colomba. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

This master’s thesis focuses on spatio-temporal data mining, a specialized field of data mining and spatial analysis that aims at discovering interest- ing patterns, relationships, and insights from data that have both spatial and temporal dimensions. These relationships are useful to help in a wide spectrum of applications and contexts, from urban planning to epidemiology. Among different approaches to address spatio-temporal data mining we fo- cus on co-location pattern mining. This method tries to uncover correlation between features or attributes of a dataset whose instances are usually found together in the same geographic area and at the same time. For example if we consider a dataset that represents events that may happen in a urban context, we may find that an episode of type "Traffic-congestion" is often spatially and temporally close to an episode of type "Sport-event". If these two types of events are found close a certain number of times, a co-location pattern ["Traffic-congestion", "Sport-event"] may be discovered. Most of state of art methods to perform co-location patterns mining con- sider only spatial dataset/dimension. In this context the work "Parallel Co-location Pattern Mining based on Neighbor-Dependency Partition and Column Calculation" proposes an innovative approach to perform spatial co- location mining. They divide the entire set of neighbor relationships among instances into some partitions. The co-location mining is then applied inde- pendently to each partition employing some new ideas to reduce the search space and thus the time and resources required by co-location mining. So, this algorithm avoids serial processing and the limitation typical of single ma- chine computing. It can be applied also to process massive spatial dataset. One step forward in the field of co-location mining has been made in the the- sis "Temporal co-location pattern discovery in spatiotemporal data through parallel computing". They show that the previously mentioned algorithm can be extended to perform mining of spatio-temporal co-location patterns. In this master thesis, we propose some new solutions in order to make the spatio-temporal co-location mining more efficient and scalable. In particular we introduce some new ideas to find effectively spatio-temporal neighbors. The main contribution of the thesis is the introduction of the concept of "se- quence of co-locations" or (co-locations sequence). We define a co-location sequence as a collection of co-locations that has an aggregator event in common and respect some spatial and temporal constraints. Co-location sequence mining should be able to discover relationship and correlations, among spatio-temporal data, that simple co-location mining is not able to find. In this thesis we provide a formal definition of co-locations sequence, we introduce some suitable metrics to evaluate how interesting are the se- quences found and we develop an algorithm to perform co-location sequence mining available for huge spatio-temporal datasets (more than 30 M events). Our solutions are developed using Apache Spark framework and Python lan- guage, the experiments are run on BigData@Polito Cluster. The results we get confirm the effectiveness of our algorithm for co-locations sequence min- ing expect for some limitations mainly due to the lack of proper real-word dataset and computational resources.

Relatori: Paolo Garza, Luca Colomba
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 72
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/28470
Modifica (riservato agli operatori) Modifica (riservato agli operatori)