Sofia Perosin
An Extraction-Abstraction Hybrid Approach for Financial Document Summarization.
Rel. Luca Cagliero, Moreno La Quatra, Jacopo Fior. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2021
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
Abstract: |
Nowadays, the quantity of data that a company manages is huge, and it is expected that it will increase in the future. For this reason, it is particularly appealing to design data-driven tools and methodologies capable of managing these pieces of information. This thesis project aims at addressing this challenge, by implementing a summarization pipeline, which is capable of offering an overview of the main topics contained in the analyzed text, by creating the corresponding summary and headline. To pursue this objective a procedure articulated in two main steps is implemented: extractive summarization and abstractive summarization. Namely, different approaches are analyzed to set up the first phase, instead the last step is always implemented with a Transformer architecture. The general framework is firstly tested on a general-purpose news dataset, to compare it with state-of-the-art models, then it is finetuned on a financial news dataset, to generate a model tailored to the financial news summarization problem. Future works should focus on enhancing the reliability of the procedure, by creating more specific financial datasets, that would be exploited to finetune the model. In fact, to the best of our knowledge, there is a lack of public available financial datasets, and this sets a limit to the achievable performances. |
---|---|
Relators: | Luca Cagliero, Moreno La Quatra, Jacopo Fior |
Academic year: | 2021/22 |
Publication type: | Electronic |
Number of Pages: | 93 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/21087 |
Modify record (reserved for operators) |