polito.it
Politecnico di Torino (logo)

Zero-shot product retrieval with contrastive learning

Lorenzo Cravero

Zero-shot product retrieval with contrastive learning.

Rel. Giuseppe Rizzo, Lorenzo Bongiovanni. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
Abstract:

While developing a platform that compares product prices and characteristics across various Italian e-commerce websites, the need to retrieve similar products across the different and ever changing websites arose. In this thesis, we approach the challenge by setting up a task of retrieval of similar products given a starting product, where each product is defined solely by its textual attributes. Particular focus is put on the zero-shot scenario, where the model performance is evaluated on both products and websites that have not been seen during training. Our goal is to produce a model that can generalize well and be easily implemented in the real-world use case. This thesis leverages two language models that were pre-trained on extensive general corpora (BERT and MPNet), and demonstrates the benefits of applying a supervised contrastive learning objective during the fine-tuning stage for the purpose of retrieving new and unseen products in a zero-shot fashion. In addition, a more in-depth analysis reveals that this approach has the ability to enrich product embeddings by improving both their alignment and distribution uniformity. Finally, these product embeddings can be retrieved quickly using KNN, ANN or more optimized embedding search algorithms such as FAISS. Overall, this thesis aims to present a valid approach to the challenge of product retrieval in a zero-shot context. It also provides valuable insights that may be relevant to real-world data integration scenarios.

Relators: Giuseppe Rizzo, Lorenzo Bongiovanni
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 72
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: FONDAZIONE LINKS-LEADING INNOVATION & KNOWLEDGE
URI: http://webthesis.biblio.polito.it/id/eprint/27107
Modify record (reserved for operators) Modify record (reserved for operators)