Politecnico di Torino (logo)

Video lectures summarization

Irene Benedetto

Video lectures summarization.

Rel. Laura Farinetti, Lorenzo Canale, Luca Cagliero. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview

With the recent advancements in e-learning platforms and the spread of distance learning, there is a growth in interest in generating content accompanying traditional video lessons, that seek to improve the teaching quality. In this context, the thesis analyses and proposes some approaches for the creation of a compact representation of video lectures. This thesis in analysis aims to compare different techniques that derive from the NLP and deep learning fields in order to summarize the content of video lecture transcripts. An Automatic Speech Recognition (ASR) system first generates a direct transcription from audio into text. Next, summarisation models summarise key parts of the transcription. Text summarization and related tasks have been extensively studied in the literature, conversely, transcript summarization has not been fully explored. Differently from the former, transcripts summarization task presents some more critical issues: typically raw transcripts do not contain punctuation marks and are difficult to segment and process for traditional text summarization techniques. Moreover, dialogues distinguish from plain texts for some syntactical features: a speech may include more redundant information rather than a plain text, caused by repetition, or, on the contrary, may highlight linguistic and paralinguistic information that transcript systems are not able to capture and codify. Lastly, it is important to point out that the conversion from audio to text may introduce some errors due to the translation process. These limitations complicate the usage of text summarization models in the case of speech transcripts. To the best of our knowledge, at the moment, there is no study focusing on summarization of educational content given the absence of a specific dataset. For this reasons, the thesis will focus on analyzing model domain adaptation capabilities and presents a novel and ad-hoc dataset, based on MIT OpenCourseWare video lecture. In the first part, the work explores different approaches for punctuation restoration for speech transcripts. In the second part, the dissertation examines the state-of-the-art approaches in text summarization and in particular, summarization in the meeting domain, the closest domain available in the literature with respect to the educational. It dedicates a section for evaluation of the results on an educational dataset EduSumm, containing transcription and summary of MIT video lecture and Politecnico's video lecture. We discovered that although state-of-the-art meeting summarization models are pre-trained on large datasets, and finetuned with meeting data, they have poor performance and generalization capability when the domain changes. In the thesis a novel approach is proposed, that has proven to be more robust to domain shift, even if it uses simpler and lightweight models.

Relators: Laura Farinetti, Lorenzo Canale, Luca Cagliero
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 156
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/19175
Modify record (reserved for operators) Modify record (reserved for operators)