Politecnico di Torino (logo)

Automatic Extraction of Slides from Scientific Papers

Yaxue Du

Automatic Extraction of Slides from Scientific Papers.

Rel. Luca Cagliero, Laura Farinetti. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (916kB) | Preview

Presentations slides have been an important and effective way to deliver the key information of a scientific paper, especially in conferences. Normally the speakers have to spend a major amount of time preparing for making slides because scientific papers are different to read and summarize the content. Automatic extraction of slides from scientific papers will be quite helpful and time-saving during the preparation of presentation slides by providing the keywords and bullet points. [Yue Hu et al., 2015] introduced a system called PPSGen, it can automatically generate presentation slides for academic papers which can be used as drafts to help the writing of the final presentation. The tool applies a Support Vector Regressor (SVR) model for sentence ranking and then Integer Linear Programming(ILP) model is used to select important sentences with a set of sophisticated constraints. In this paper, a system extends the PPSGEN system is proposed to automatically extracting the presentation slides from scientific papers. It firstly applies an unsupervised summarization algorithm to rank the input sentences in the academic paper based on relative importance.Then, it selects the key sentences and phrases to include in each slide using a optimization-based algorithm, which is based on an Integer Linear Programming (ILP) Module. Finally the selected key sentences and phrases can be used to output the well-structured slides. The evaluation rouge results is based on the test of 195 paper-slides pairs which is collected from the website. Our goal is to compare the performance of unsupervised and supervised learning models, to verify whether a unsupervised learning model could get a better rouge score than a supervised one and finally to find a better performance system for automatically extracting presentation slides from scientific papers. Based on our current evaluation rouge results, supervised learning models performs better than unsupervised learning models and RFR can have a higher rouge score (ROUGE-1-F, ROUGE-2-F, ROUGE-SU4-F ) among the other supervised learning models.

Relators: Luca Cagliero, Laura Farinetti
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 50
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/18171
Modify record (reserved for operators) Modify record (reserved for operators)