Predicting Bicycle Availability By Means Of Data Mining Algorithms

Sorath Asnani

Predicting Bicycle Availability By Means Of Data Mining Algorithms.

Rel. Paolo Garza, Luca Cagliero, Silvia Anna Chiusano. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2018

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (1MB) | Preview

Abstract:	The concerns about global warming, air and noise pollutions, unstable fuel prices and road safety have caused policy makers to examine the need for sustainable means of transport. In the context of better urban mobility systems, the public Bicycle Sharing Systems (BSSs) have seen a great development in recent years. Community shared bicycle programs are being promoted all over the world as a “green” transportation system.A BSS is an innovative transportation service, usually aimed for short-distance trips. The core idea of a BSS is that a user takes a bicycle from station A, uses that bike to travel to another location and returns the bicycle at station B. Community shared bicycling programs offer an environmentally friendly and inexpensive means of inner-city transportation.BSSs have some problems related to limited number of bicycles and limited number of free slots in stations. In some occasions, it is not possible to pick-up bicycle at a certain station because the station might be empty or it might contain broken bikes. It may also happen that a user does not find a parking slot to drop the bicycle at a certain destination station.One possible solution to these problems is to enable the users to know beforehand about the availability of bicycles so that they can go directly to those stations where bikes/parking slots are available. This can be done by predicting the number of bikes at each station at some future point of time. The major research goal of the thesis is to study and compare some models to predict the availability of bikes in Bicing stations some minutes ahead.Data pre-processing was the critical part of this research. Extensive efforts have been put in identifying the dirty data in the available data set and to perform data cleaning in order to ensure the correctness of the data to be trained and tested. After reviewing the literature, we decided to use three models for predicting the number of bikes, the ARMA, the Decision Tree and the Random Forest models. After the implementation of these models, a Baseline model based on the Historic Mean was considered to measure the performance of the three models.After data cleaning, the total number of stations used in this study was 268. The training data set comprises of 800 continuous hours and the testing data set composed of non-overlapping data of 30 hours. All the models were trained using the same data sets.The performance of these models was calculated in terms of the absolute errors. The minimum, the maximum and the mean absolute errors were computed for all the models, including the Historic Mean Baseline model. Six prediction models were generated from each of the four algorithms, one prediction model for each prediction time.The mean absolute errors of all the prediction models were compared with each other to rank the performance of the models. The Random Forest algorithm showed the highest mean absolute errors for all the prediction time periods; hence its performance to predict the number of bikes is worst in our case.On the other hand, an interesting fact was identified between the ARMA and the Historic Mean Baseline model. For the predictions of 10, 20 and 30 minutes ahead, the Baseline model showed the least mean absolute errors, while for higher time instances, such as 40, 50 and 60 minutes ahead, the performance of ARMA model was the best among all. Such results can lead us to the conclusions that the ARMA Model can be the best choice for predictions beyond 30 minutes, while for shorter terms, the Historic Mean can be considered as the best.Hence the research was concluded by considering the ARMA model to be the best among other three models i.e. the Decision Tree, the Random Forest and the Historic Mean Baseline models for predictions from 40 to 60 minutes ahead.
Relatori:	Paolo Garza, Luca Cagliero, Silvia Anna Chiusano
Anno accademico:	2017/18
Tipo di pubblicazione:	Elettronica
Numero di pagine:	66
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/7999

Modifica (riservato agli operatori)