polito.it
Politecnico di Torino (logo)

Topic-Aware Multi-Stream Text Retrieval for Crisis Related Summarization

Claudiu Constantin Tcaciuc

Topic-Aware Multi-Stream Text Retrieval for Crisis Related Summarization.

Rel. Paolo Garza, Daniele Rege Cambrin. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

The amount of information produced daily reaches terabytes of data being able to retrieve important facts timely is proving to be an arduous challenge. The retrieval of key information during crisis events is an important task that can help the survival of people present in the affected areas. In the last years, with the advance of Natural Languages Processing (NLP) models and retrieval system multiple solutions have been proposed to address this task, also thanks to the TREC CrisisFACTS challenge focusing on temporal retrieval. In this thesis, we exploited topic modeling techniques like BERTopic, capable of creating clusters of semantically similar data, in combination with dense and lexical retrieval like BM25 and encoders models, and neural reranking exploiting RR (Retrieve & Re-Rank) algorithm to retrieve information from different types of social media (Facebook, Twitter, Reddit, and news outlets) in order to produce a summary of their content in the context of crisis event management. The experimental results show our solution is able to retrieve useful facts and produce accurate summaries during crisis events. We achieve higher ROUGE and BERTScore than the means results obtained by CrisisFACTS participants without affecting the scalability.

Relatori: Paolo Garza, Daniele Rege Cambrin
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 65
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/35461
Modifica (riservato agli operatori) Modifica (riservato agli operatori)