polito.it
Politecnico di Torino (logo)

Heterogeneous data-driven recommendation systems for books in libraries

Alessandro Speciale

Heterogeneous data-driven recommendation systems for books in libraries.

Rel. Luca Vassio, Marco Mellia, Greta Vallero. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

In an era where digital progress runs fast, it is imperative to find ways to innovate infrastructures through technology in order to help them improving the quality of their services. This thesis focuses on finding ways to extract information from data supplied by "Biblioteche Civiche Torinesi" (BCT) to implement a recommender system capable of suggesting a book tailored to the reader. This is done in order to offer services that can improve the reading experience. For the thesis, I followed three main phases. The first was the data characterization one: my main objective was to define, quantify and preprocess BCT data in order to make it apt to being used as input for a recommender system. This phase was also heavily based on the use of Anobii (a book-based social network) to augment both the quantity of data and of information per book at our disposal. Anobii data was crucial in finding reading patterns due to the richer quantity of information related to books with respect to BCT. In particular, we analyzed a dataset of approximately 100000 books although, with the necessary filters (Italian language, removal of periodicals and a threshold for books with a minimum number of ratings) that number dropped to 3000-8000 depending on the threshold used. The size of each book entity depended much on the length of the Anobii book's description: to maintain meaningful info we added a filter to have descriptions of at least 50 characters. The second phase focused on increasing the quantity of information for each book by inferring them from already possessed info. This was done in order to make the environment more interactive by allowing the user to input characteristics he would like to see in a book. Following this thread, I used Anobii's data regarding the books' descriptions to extract a mood (anger, sadness, joy or fear) and a sentiment (positive or negative) related to the books using natural language processing libraries. To do so, I passed the strings associated to the descriptions to the libraries that, by using a neural network trained on online comments, gave an output. Then, I went on by implementing a term frequency algorithm useful to extract meaningful keywords related to the books in order to characterize a genre by a set of keywords. Finally, the third phase was the recommender system one. During this phase, we applied various recommender models and fine-tuned them to have the best performance for the problem at hand. One of the major problems was how to use the explicit ratings of Anobii with the implicit book-reader BCT interaction. The answer was to build implicit feedback models. Both shallow and deep models were implemented. The shallow one is a BPR model using triplet loss while the deep one is a retrieval model with the possibility to add a ranking model to improve the quality of the recommendations. The models were trained with the 64% of the dataset created during the first two phases (while 16% was used for evaluation and 20% for test) and the main metrics used to evaluate the models were the mean number of users with a relevant recommendation, the mean number of relevant recommendations per user and the average rank. The models managed to give a relevant recommendation to more than 20% of the dataset's users, managing to get the reading patterns of different types of readers. Due to the good results of both models, the possibility of hybridating the two was analyzed to see if this could lead to interesting improvements.

Relatori: Luca Vassio, Marco Mellia, Greta Vallero
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 87
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/25470
Modifica (riservato agli operatori) Modifica (riservato agli operatori)