Mulit-Document Summarization Driven by Domain-Specific Embedding
Mattia Cara
Mulit-Document Summarization Driven by Domain-Specific Embedding.
Rel. Luca Cagliero, Paolo Garza. Politecnico di Torino, Master of science program in Computer Engineering, 2020
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) | Preview |
Abstract
Word embeddings are nowadays widely deployed in a large number of Natural Lan-guage Processing tasks. A word embedding is used to map each word belonging toa corpus, into a vector space, keeping semantic and syntactic properties. They areused in different implementations such as sentiment analysis, topic extraction, Part-Of-Speech tagging and of course document summarization. The focus of this thesisis towards this last job: the object is to extrapolate, given a collection of articles, themost relevant sentences to provide the reader only a limited set of information buthopefully the most meaningful.Particularly, the scope of this work is to empirically show that a domain-specific wordembedding is able to extract a better summary with respect to a general one.
The ideabehind is that training a word embedding with documents of the same topic, will pro-duce a better representation of all the words related to that argument, because, withrespect to a non specific text, they are present more often and used in a more spe-cific context
Relators
Publication type
URI
![]() |
Modify record (reserved for operators) |
