polito.it
Politecnico di Torino (logo)

Machine Learning and Deep Learning techniques for dress product recommendation, dynamic pricing and counterfeit detection

Luca Campana

Machine Learning and Deep Learning techniques for dress product recommendation, dynamic pricing and counterfeit detection.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Abstract:

This master thesis describes the project ended in the development of three ML/DL-based algorithms that perform the tasks of dress product recommendation, dynamic pricing, and counterfeit detection. The KDD process is addressed in all its constitutive phases. The starting point is data collection, which was performed through web scraping and web crawling techniques, applied over the well-known apparel e-commerce platform YOOX, precisely in the section dedicated to woman dresses. After the data was acquired, we performed a preliminary exploration step, regarding missing values, moving then to actual preprocessing transformations. Here, a specific tailored strategy have been devised for each single field, including standardizations, normalizations, encodings of nominal attributes (One-Hot Encoding, binary), tokenization of textual attributes, missing values imputations. A particular attention was reserved to the handling of visual data, i.e. the dress pictures. By the means of a pretrained version of the MobileNet architecture (weight set related to ImageNet training), we performed feature extraction; then, the dimensionality was furtherly reduced without loss of information by the means of PCA. The preprocessing step ended in the transformation of every dress sample in a numeric array of 931 values, that could be successfully submitted in machine learning models. The next constitutive phase regarded the setup and initialization of actual models. To address the three mentioned tasks, it was decided to use differently initialized versions of the KNN neighborhood analysis method. In fact, since data lacks of possible target values, the KNN here was used not to produce a final classification or regression value, but in its most primitive sense, having the only scope to retrieve input neighbors, that is to say its closest samples in the submitted dataset. Developed KNN models share the distance metric, that is the Euclidean distance. Instead, they differ for the chosen k value. For the product recommendation task, the algorithm outputs are simply the retrieved neighbors. In this case, the decision was to set k equal to 5, following the several examples of real e-commerce platforms. Instead, for dynamic pricing and counterfeit detection tasks, the devised solutions were a bit more sophisticated. Here, the decision was to enlarge the neighborhood size to 200 units and use the retrieved neighbors' prices to form a distribution that was then used to perform a quantile analysis and to produce the two final output amounts, intended as the price advised for the selling, and a 'falsification threshold', under which there is a considerable possibility that the submitted item is indeed counterfeit. Practically speaking, the former amount corresponded to the distribution median, while the latter was evaluated with a formula similar to the one used in BoxPlot analyses. Performances were enhanced by a brand bucketing strategy: each dress was assigned to one of five luxury categories, basing on the average sell price of its brand. Then, KNN was performed only on dresses of the same category of the input, thus allowing a more meaningful evaluation. The described algorithms were finally tested, and deployed through a stand-alone user-friendly web application, exhibiting some additional functionalities (such as an advanced filtering section for product recommendation) and conveying the obtained results through interactive visualizations.

Relatori: Paolo Garza
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 107
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: EY Business & Technology Solution S.r.l.
URI: http://webthesis.biblio.polito.it/id/eprint/25573
Modifica (riservato agli operatori) Modifica (riservato agli operatori)