Politecnico di Torino (logo)

Applying Natural Language Processing techniques to analyze HIV-related discussions on Social Media

Antonino Angi

Applying Natural Language Processing techniques to analyze HIV-related discussions on Social Media.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview

Nowadays social media are being used to monitor the progress of viruses and share important prevention and treatment information. This has also allowed the creation of a community of people united by the same disease, to give themselves strength, comfort and advice. The objective of this work is to extract and understand discussions about HIV on a popular social media platform: Twitter, a micro-blogging application. Tweets with the hashtag #HIV were collected in the date range of one year, starting from November 12th 2018 to November 12th 2019. They were then filtered and cleaned using NLP techniques, which allowed the removal of duplicates, non-english texts and useless information, such as tweets only containing urls, mentions or hashtags. After the cleaning phase, the main analyzes carried out were sentiment analysis and content analysis which, using data mining and text mining algorithms were able to reveal their emotions and the most influential topics written about HIV. This study illustrates the potential of using social media to analyze the spread of viruses and health conditions using two types of analyses for the same topic and dataset: sentimental analysis and content analysis. HIV-related messages were used by organizations and credible sources to disseminate information about treatment and prevention, but also by individual users to share their thoughts, emotions and experiences of living with HIV. Twitter is also used by celebrities and health authorities to respond to public concerns. This work shows that many tweets are written for the purpose of giving information and emotional support with assistance from the online community and also for health care professionals who support individuals living with HIV/AIDS. The algorithms and notions covered in this work can subsequently be used by the public health community or data scientists to analyze tweets regarding other viruses or diseases, showing how social media can be used to identify, detect and study outbreaks in a specific geographical area and in a specific period of time.

Relators: Paolo Garza
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 79
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Ente in cotutela: AALTO UNIVERSITY OF TECHNOLOGY - School of Science (FINLANDIA)
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/15239
Modify record (reserved for operators) Modify record (reserved for operators)