polito.it
Politecnico di Torino (logo)

Leveraging Wikidata to highlight differences in topics and countries on Instagram Social Network

Carmine De Cristofaro

Leveraging Wikidata to highlight differences in topics and countries on Instagram Social Network.

Rel. Luca Vassio, Martino Trevisan. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (16MB) | Preview
Abstract:

In today’s digital era, Online Social Networks (OSNs) have emerged as powerful platforms that shape the way we connect, communicate, and share information. Among these platforms, Instagram stands out as a prominent player, attracting billions of users worldwide and becoming a significant cultural phenomenon. With its visually appealing content and global reach, Instagram has redefined the landscape of social media, transcending geographical boundaries and bridging diverse communities. The objective of this thesis is to analyze and study the effects of Instagram usage on the homogenization or preservation of geographical and cultural identities. The proposed approach involves the analysis of Instagram profiles from five different European countries: Italy, France, Germany, the United Kingdom, and Spain, across three distinct social macro-areas, namely politics, culture (specifically athletes, models, and actors), and academia (university profiles). The initial phase of the work extensively utilizes the Wikidata database and its semantic query language to construct a profile database containing all the necessary information for categorizing Instagram users. These users can be either individuals or associations such as universities, belonging to both geographical and social areas of interest. Subsequently, the lists of profiles, categorized by social areas, are loaded into Crowdtangle, a Meta tool, to build the Instagram post database for each user in 2022, along with relevant information (e.g., interactions in the form of likes and/or comments, profile followers, post descriptions, etc.). The obtained dataset comprises 6,939 Instagram profiles and a total of 401,495 posts. The dataset is then processed to visualize the main statistical characteristics related to profiles and the posts they have published. For profiles, the analyses focus on the number of followers and profile activity over the course of the year, while for posts, types, descriptions, and interaction (passive through likes and active through comments) are analyzed. Through post descriptions, a topic recognition process is conducted to identify discussion topics addressed throughout the year by the various study categories within the European landscape. Topic recognition is performed using BertTopic, a topic modeling model that extends the extraction of coherent topic representations through a class-based variation of TF-IDF. Specifically, the model generates document embeddings using pre-trained transformer-based language models, clusters these embeddings, and finally generates topic representations using the class-based TF-IDF procedure. Lastly, this thesis focuses on the top five profiles in each category and country, analyzing their temporal trends in activity, followers, and interaction to derive empirical insights into real-world events that have influenced Instagram's post-stream. In summary, this study is significant as it contributes to understanding the role of social media in contemporary society, particularly its impact on politics, culture, and academia. By utilizing Instagram data, it provides valuable insights into user digital behaviors and trends. These insights can inform policymakers and businesses in formulating targeted policies and effective marketing strategies. Furthermore, it presents opportunities for individuals and organizations to harness the power of social media for fostering positive engagement, cultural exchange, and knowledge dissemination.

Relatori: Luca Vassio, Martino Trevisan
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 118
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/28444
Modifica (riservato agli operatori) Modifica (riservato agli operatori)