Politecnico di Torino (logo)

Scientific Papers Slide Generation using Abstractive Text Summarization

Simone Manni

Scientific Papers Slide Generation using Abstractive Text Summarization.

Rel. Luca Cagliero, Moreno La Quatra. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview

Slides-based presentations are increasingly important in scientific dissemination as they incorporate several useful information for publication understanding. They usually contains short summaries of the main paper contributions and cover all the sections of the original publication. Manually generating slides content, however, is an expensive task. Recent advancements in machine learning and artificial intelligence allowed the creation of automatic systems that aims at generating summaries from scientific articles. Those summaries can be used to reduce the amount of content that requires manual analysis. Limited research efforts have been devoted to the scope of generating presentation slides from document. Indeed, this task faces the scarcity of publicly available material for benchmarking. This master thesis proposes a new dataset, APPreD, consisting of pairs of papers and their corresponding presentation slides crawled from ACL online anthology. It also proposes a deep-learning based approach that addresses document-to-slide task. The proposed methodology entails (i) the classification of academic content in IMRaD classes, (ii) the fine-tuning of a pre-trained model for abstractive summarization of section content. The proposed methodology has been trained and tested using benchmark data collections. The implemented system relies only on textual domain and can be further developed in order to be able to address multimodal domain including multimedia objects. It outperforms state-of-the-art summarization baselines according to several standard evaluation metrics.

Relators: Luca Cagliero, Moreno La Quatra
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 62
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/21086
Modify record (reserved for operators) Modify record (reserved for operators)