polito.it
Politecnico di Torino (logo)

Automatic grouping of actions in user flow analytics

Gabriele Mario Antonio Spina

Automatic grouping of actions in user flow analytics.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Abstract:

Amadeus IT Group, commonly known as Amadeus, is one of the main IT providers for the travel and tourism industry. One of the products of the company is Selling Platform Connect, a platform where travel agencies can access all the distribution services provided by Amadeus and the travel solutions proposed by vendors such as airlines, hotels, or railway companies. All the actions performed by travel agents in the platform are logged and analyzed by the Data Analytics team, which delivers studies, analysis and KPIs to transform data into smarter decisions. One of the missions of the team is to study the behavior of the user in the platform by interpreting the sequence of actions performed in the platform, which takes the name of flow. In the context of studying the flow, it is important to reduce their complexity, making them more interpretable. One way to achieve the goal is by grouping together similar flows by means of clustering techniques, identifying categories of them. A first phase of the project focused on the exploration of the dataset and on the identification of the actions linked to the behavior of the user. This was followed by the simplification of the collected data, with the characterization of flows as smaller functional units to be extracted from user's sessions. Then, different clustering pipelines were proposed to cluster flows of different nature. A K-means model based on TF-IDF features was proposed to cluster Graphic flows, that are flows of actions performed using the graphical interface of the platform. A different topic model based on Latent Dirichlet Allocation was instead considered for Cryptic flows, characterized by the usage of the Cryptic command line of Sell Connect. Finally, it was conducted a study on the clusterability of embeddings extracted from a predictive LSTM model, already used to predict the next action in the sequence, and reimplemented in the PyTorch framework for this project. A proof of concept was developed by using the model in a deep clustering framework, training it to learn K-means-friendly embeddings.

Relatori: Paolo Garza
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 86
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: INSTITUT EURECOM (FRANCIA)
Aziende collaboratrici: AMADEUS SAS
URI: http://webthesis.biblio.polito.it/id/eprint/31119
Modifica (riservato agli operatori) Modifica (riservato agli operatori)