Daniela Martorana
Mining user reviews on public transport systems using machine learning techniques.
Rel. Silvia Anna Chiusano, Luca Cagliero, Elena Daraio. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
Abstract: |
Nowadays, with the increasingly pervasive advent of the digital world, we are inundated with large amounts of data in text format relating to many everyday contexts. One important part of this data concerns user reviews of products and services. Analysing this large information content with machine learning algorithms can provide very useful information for companies and vendors to make improvements. This type of text analysis with machine learning belongs to the branch of Natural Language Processing (NLP) and more specifically to the branch of Sentiment Analysis. As far as sentiment analysis is concerned, there are many methods to carry out this kind of analysis in literature. Roughly speaking, there are mainly two ways to address the problem of sentiment analysis: rule or lexicon-based approaches and machine learning algorithms, which are then divided into classical machine learning algorithms (e.g. SVM, naive bayes, linear regression) and neural networks. In this work, we considered as a reference case study a collection of user-provided labelled reviews regarding mobility services in the city of Dubrovnik. A data analysis methodology based on NLP techniques has been defined to mine useful insights from this data collection. The objectives of the thesis can be summarized as follows: to define a new methodology to analyze textual data; to build a sentiment analysis algorithm that correctly classifies the instances into positive and negative reviews; to find a model that satisfies the second objective, while providing a compact and interpretable representation of the model that allows us to highlight terms and concepts related to the strengths and weaknesses of mobility systems. To satisfy the objectives above, the associative classifier L3 was chosen, in order to create a readable and interpretable model, thanks to the rules obtained through this. In addition, several textual pre-processing and data transformation techniques were applied to address the data preparation phase. More in detail, topic modeling techniques were used to derive additional information content from data and to select the subset of relevant features, in order to represent the data collection in a suitable form for the L3 classifier. To summarise, the main contributions of this work are to create a sentiment analysis algorithm with high performance and, at the same time, provide an interpretable model, in contrast to other algorithms in literature. In order to estimate the quality of the proposed approach, it has been compared with traditional solution methods, such as SVM. |
---|---|
Relatori: | Silvia Anna Chiusano, Luca Cagliero, Elena Daraio |
Anno accademico: | 2021/22 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 104 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/22790 |
Modifica (riservato agli operatori) |