polito.it
Politecnico di Torino (logo)

Hybrid Movie Recommender System using NLP techniques for items' features generation

Giovanni Cioffi

Hybrid Movie Recommender System using NLP techniques for items' features generation.

Rel. Elena Maria Baralis. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

Recommender systems address the information overload problem in the Internet by estimating users’ preferences and recommending items they might like and interact with. For online entities and companies, these tools have become key components of their websites or applications in order to boost activities, enhance customer experience and facilitate users’ decision-making through personalization. The growing availability of online information and the advancements in the field of Deep Neural Networks have determined the transition from traditional methods such as purely content-based or collaborative filtering to hybrid models: capable of improving the recommendation quality, capturing more complex user-item relationships and better in tackling user-item cold start problem. The objective of this Master Thesis, developed during an Internship in Data Reply, is to build a hybrid recommender engine capable of exploiting both user-item interactions and their related metadata, working on a practical use-case. We create a movie recommender system trained on "The Movies Dataset" from Kaggle, an open-source ensemble of thousands movies' metadata such as genre, language, plot, and millions of user-item ratings collected from grouplens.org, the Social Computing research lab at the University of Minnesota and The Movies DataBase (TMDB), a community built online movie and TV database. We explore the state-of-the art of recommender system algorithms, together with their related similarity and evaluation metrics. We train, evaluate and compare several recommendation models: a pure content-based model whose topK ranking is solely based on similarities between users' profiling vectors and movies' vectors, a collaborative-filtering model built using matrix-factorization over interactions matrix and, finally, hybrid models exploiting both ratings and user-item metadata. Main experiments were carried out with LightFM, a hybrid recommendation engine library provided by the fashion company Lyst. Especially, in this thesis we investigate how natural language information can contribute to better suggest items tailored to users' preferences. We explore the state-of-the-art of Natural Language Processing (NLP) techniques for textual feature engineering starting from context-free models like TF-IDF and word2vec until transformer-based models such as BERT and GPT-2. We investigate and implement techniques for documents' similarity and topic modelling in order to train a hybrid recommendation model able to accurately suggest movies based on users' past interactions but also taking advantage of items' metadata. Movies’ textual synopsis are engineered as points in a multi-dimensional embedding space using BERT and categorized in topics with HDBScan clustering in order to be digested by the recommendation engine. Moreover, we make experiments for validating the capability of those hybrid models to properly tackle user cold-start and item cold-start issues: recommendation settings in which new users or items without past interactions are introduced, also when trained in production.

Relatori: Elena Maria Baralis
Anno accademico: 2021/22
Tipo di pubblicazione: Elettronica
Numero di pagine: 113
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: DATA Reply S.r.l. con Unico Socio
URI: http://webthesis.biblio.polito.it/id/eprint/22582
Modifica (riservato agli operatori) Modifica (riservato agli operatori)