Politecnico di Torino (logo)

Improving Document Summarization Using Crosslingual Word Embeddings

Catia Blengino

Improving Document Summarization Using Crosslingual Word Embeddings.

Rel. Luca Cagliero, Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

In recent years, due to the increase of information available online in multiple languages and the inability of a user to examine it manually, several text summarization techniques have been developed. This thesis proposes a new methodology to extract significant sentences from a collection of textual documents written in multiple languages. Specifically, it aims at extracting a summary in any of the source languages by exploiting also the semantic relationships between cross-lingual content. To this purpose, it exploits aligned word embedding models to extract cross-lingual relationships and a graph-based approach to pick the most significant sentences. The results demonstrate that using cross-lingual text correlations improves summarizer performance.

Relators: Luca Cagliero, Paolo Garza
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 83
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/14348
Modify record (reserved for operators) Modify record (reserved for operators)