Sofia Perosin
An Extraction-Abstraction Hybrid Approach for Financial Document Summarization.
Rel. Luca Cagliero, Moreno La Quatra, Jacopo Fior. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2021
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
Abstract
Nowadays, the quantity of data that a company manages is huge, and it is expected that it will increase in the future. For this reason, it is particularly appealing to design data-driven tools and methodologies capable of managing these pieces of information. This thesis project aims at addressing this challenge, by implementing a summarization pipeline, which is capable of offering an overview of the main topics contained in the analyzed text, by creating the corresponding summary and headline. To pursue this objective a procedure articulated in two main steps is implemented: extractive summarization and abstractive summarization. Namely, different approaches are analyzed to set up the first phase, instead the last step is always implemented with a Transformer architecture.
The general framework is firstly tested on a general-purpose news dataset, to compare it with state-of-the-art models, then it is finetuned on a financial news dataset, to generate a model tailored to the financial news summarization problem
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
URI
![]() |
Modifica (riservato agli operatori) |
