Luca Ferraro
A parallel algorithm for mining sequences of spatio-temporal co-location patterns.
Rel. Paolo Garza, Luca Colomba. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) | Preview |
Abstract: |
This master’s thesis focuses on spatio-temporal data mining, a specialized field of data mining and spatial analysis that aims at discovering interest- ing patterns, relationships, and insights from data that have both spatial and temporal dimensions. These relationships are useful to help in a wide spectrum of applications and contexts, from urban planning to epidemiology. Among different approaches to address spatio-temporal data mining we fo- cus on co-location pattern mining. This method tries to uncover correlation between features or attributes of a dataset whose instances are usually found together in the same geographic area and at the same time. For example if we consider a dataset that represents events that may happen in a urban context, we may find that an episode of type "Traffic-congestion" is often spatially and temporally close to an episode of type "Sport-event". If these two types of events are found close a certain number of times, a co-location pattern ["Traffic-congestion", "Sport-event"] may be discovered. Most of state of art methods to perform co-location patterns mining con- sider only spatial dataset/dimension. In this context the work "Parallel Co-location Pattern Mining based on Neighbor-Dependency Partition and Column Calculation" proposes an innovative approach to perform spatial co- location mining. They divide the entire set of neighbor relationships among instances into some partitions. The co-location mining is then applied inde- pendently to each partition employing some new ideas to reduce the search space and thus the time and resources required by co-location mining. So, this algorithm avoids serial processing and the limitation typical of single ma- chine computing. It can be applied also to process massive spatial dataset. One step forward in the field of co-location mining has been made in the the- sis "Temporal co-location pattern discovery in spatiotemporal data through parallel computing". They show that the previously mentioned algorithm can be extended to perform mining of spatio-temporal co-location patterns. In this master thesis, we propose some new solutions in order to make the spatio-temporal co-location mining more efficient and scalable. In particular we introduce some new ideas to find effectively spatio-temporal neighbors. The main contribution of the thesis is the introduction of the concept of "se- quence of co-locations" or (co-locations sequence). We define a co-location sequence as a collection of co-locations that has an aggregator event in common and respect some spatial and temporal constraints. Co-location sequence mining should be able to discover relationship and correlations, among spatio-temporal data, that simple co-location mining is not able to find. In this thesis we provide a formal definition of co-locations sequence, we introduce some suitable metrics to evaluate how interesting are the se- quences found and we develop an algorithm to perform co-location sequence mining available for huge spatio-temporal datasets (more than 30 M events). Our solutions are developed using Apache Spark framework and Python lan- guage, the experiments are run on BigData@Polito Cluster. The results we get confirm the effectiveness of our algorithm for co-locations sequence min- ing expect for some limitations mainly due to the lack of proper real-word dataset and computational resources. |
---|---|
Relators: | Paolo Garza, Luca Colomba |
Academic year: | 2023/24 |
Publication type: | Electronic |
Number of Pages: | 72 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/28470 |
Modify record (reserved for operators) |