polito.it
Politecnico di Torino (logo)

Micro Influencer Classifier: an academic and economic approach

Paolo Fiorio Pla

Micro Influencer Classifier: an academic and economic approach.

Rel. Luca Ardito, Simone Leonardi. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
Abstract:

The advent of social networks in the last decade has significantly changed the concept of popularity and its relationship with the public, allowing a new way of managing the market. A new key figure has emerged, the influencer: someone who has the power to affect others’ purchasing decisions because of its relevance, knowledge and relationship with its audience. In an ever-widening roster, brands are looking for improved ways to identify suitable influencers; this is even more challenging with micro influencers, which are more affordable but difficult to discover. Micro influencers are not prominent figures, celebrities or world-renowned experts; they specialize in a particular topic and share content about their interests only. Their high Return On Investment(ROI), commitment and persuasive power in their communities make them a very desired figure on the market. This thesis faces the individuation challenge by providing a framework for both academic and economic use on Twitter and Instagram. The academic approach starts with the creation of ad-hoc datasets for both social networks, given an heterogeneous list of topics as input. For every topic a balanced mixture of micro and not micro influencers is selected and extended with the most suitable user's metrics concerning both the general account metadata and the effective engagement with the audience. A further analysis based on Natural Language Processing (NLP) methods is adopted to better understand the communication techniques that characterize micro influencers either for tweets and Instagram posts. Specifically for the Instagram case, posts' descriptions are enriched with text obtained through Image Captioning methods applied on their respective pictures. Once acquired the text, a cleaning and preprocessing step is applied, comprehending: language detection, emoji’s translation, punctuation and stopwords removal. Subsequently, with the cleaned text two more steps for each user are made to complete the dataset's creation: the calculation of the frequency of use of the topic-word among all posts and a sentiment analysis returning a positive-neutral-negative score. Finally, after an analysis based on the distributions of all the evaluated metrics, each user receives a score based on its positioning inside the respective distribution of each metric. In this context, only the user having a score higher than the dataset's average score receives the status of micro influencer for its specific topic. Once the extended dataset has been built, a selection of the best classification model is performed, concluding the academic approach. In this work, the classifier that guarantees for both social networks the best performances is the eXtreme Gradient Boosting (XGBoost) with an accuracy that reaches 100% for the training set and overcomes the 90% for the test set. Specifically, XGBoost classifier is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. As the last step, the selected model gets involved in the commercial outcome of this work: a framework that allows the brand to insert as input users and topic of interest, retrieves all the necessary data and evaluates if each user can be considered as micro influencer for that specific topic by publishing as output the results. Despite the limited amount of data available due to social networks restrictions, the framework reaches satisfactory results which can be improved in future works developing less constrained libraries.

Relators: Luca Ardito, Simone Leonardi
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 93
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/22681
Modify record (reserved for operators) Modify record (reserved for operators)