polito.it
Politecnico di Torino (logo)

An Extraction-Abstraction Hybrid Approach for Financial Document Summarization

Sofia Perosin

An Extraction-Abstraction Hybrid Approach for Financial Document Summarization.

Rel. Luca Cagliero, Moreno La Quatra, Jacopo Fior. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
Abstract:

Nowadays, the quantity of data that a company manages is huge, and it is expected that it will increase in the future. For this reason, it is particularly appealing to design data-driven tools and methodologies capable of managing these pieces of information. This thesis project aims at addressing this challenge, by implementing a summarization pipeline, which is capable of offering an overview of the main topics contained in the analyzed text, by creating the corresponding summary and headline. To pursue this objective a procedure articulated in two main steps is implemented: extractive summarization and abstractive summarization. Namely, different approaches are analyzed to set up the first phase, instead the last step is always implemented with a Transformer architecture. The general framework is firstly tested on a general-purpose news dataset, to compare it with state-of-the-art models, then it is finetuned on a financial news dataset, to generate a model tailored to the financial news summarization problem. Future works should focus on enhancing the reliability of the procedure, by creating more specific financial datasets, that would be exploited to finetune the model. In fact, to the best of our knowledge, there is a lack of public available financial datasets, and this sets a limit to the achievable performances.

Relatori: Luca Cagliero, Moreno La Quatra, Jacopo Fior
Anno accademico: 2021/22
Tipo di pubblicazione: Elettronica
Numero di pagine: 93
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/21087
Modifica (riservato agli operatori) Modifica (riservato agli operatori)