
Claudiu Constantin Tcaciuc
Topic-Aware Multi-Stream Text Retrieval for Crisis Related Summarization.
Rel. Paolo Garza, Daniele Rege Cambrin. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) | Preview |
Abstract: |
The amount of information produced daily reaches terabytes of data being able to retrieve important facts timely is proving to be an arduous challenge. The retrieval of key information during crisis events is an important task that can help the survival of people present in the affected areas. In the last years, with the advance of Natural Languages Processing (NLP) models and retrieval system multiple solutions have been proposed to address this task, also thanks to the TREC CrisisFACTS challenge focusing on temporal retrieval. In this thesis, we exploited topic modeling techniques like BERTopic, capable of creating clusters of semantically similar data, in combination with dense and lexical retrieval like BM25 and encoders models, and neural reranking exploiting RR (Retrieve & Re-Rank) algorithm to retrieve information from different types of social media (Facebook, Twitter, Reddit, and news outlets) in order to produce a summary of their content in the context of crisis event management. The experimental results show our solution is able to retrieve useful facts and produce accurate summaries during crisis events. We achieve higher ROUGE and BERTScore than the means results obtained by CrisisFACTS participants without affecting the scalability. |
---|---|
Relatori: | Paolo Garza, Daniele Rege Cambrin |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 65 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/35461 |
![]() |
Modifica (riservato agli operatori) |