Salvatore Stefano Furnari
Associative Classification of Spatio-Temporal Data.
Rel. Paolo Garza, Luca Colomba. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) | Preview |
Abstract: |
Among the data mining tasks, the extraction of patterns that show relevant spatial and temporal dependencies among data is one of the most useful in order to deal with a wide range of fields of application. In this thesis, we leveraged on an existing data mining algorithm specifically designed to handle spatio-temporal events to extract association rules. Such rules are applied in a predictive context to a real-case scenario, a station-based bike sharing system. More specifically, we take into account the analysis of years of historical data about a bike-sharing service based in San Francisco, from which we want to extract patterns useful to deploy an associative classifier. The objective is to look for patterns which embody both the spatial and temporal dimensions, in a way that is not specific of certain trajectories observed over a region of interest: we aim to generalize this type of information by detecting sequences of events of interest, reporting spatiotemporally invariant properties. So we propose an efficient algorithm to extract this kind of patterns and validate its efficiency and effectiveness on real data. For this purpose, we perform a binary classification task that involves the recognition of critical conditions for the bike stations such as the lack of bikes or the complete occupancy of the docks. We compare these results with other models: classical algorithms such as Decision Tree and Random Forest, and a Baseline which consists of an associative classifier that makes use of just a single simple and intuitive rule. Several experimental settings are designed: for some of these, the whole dataset is considered. In other cases, we do not keep just the whole selection of data, but instead we subdivide the dataset partitions into timeslots of a certain amount of hours, in order to better assess the models performances in the various parts of the day and their different trends. Indeed, each timeslot is characterized by its own type of traffic, determined by differences in people necessities, leaving inevitably some timeslots with more stability in the status of the stations, while the others have a more frenetic pace of changes. |
---|---|
Relators: | Paolo Garza, Luca Colomba |
Academic year: | 2022/23 |
Publication type: | Electronic |
Number of Pages: | 83 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/26816 |
Modify record (reserved for operators) |