polito.it
Politecnico di Torino (logo)

Mulit-Document Summarization Driven by Domain-Specific Embedding

Mattia Cara

Mulit-Document Summarization Driven by Domain-Specific Embedding.

Rel. Luca Cagliero, Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

Word embeddings are nowadays widely deployed in a large number of Natural Lan-guage Processing tasks. A word embedding is used to map each word belonging toa corpus, into a vector space, keeping semantic and syntactic properties. They areused in different implementations such as sentiment analysis, topic extraction, Part-Of-Speech tagging and of course document summarization. The focus of this thesisis towards this last job: the object is to extrapolate, given a collection of articles, themost relevant sentences to provide the reader only a limited set of information buthopefully the most meaningful.Particularly, the scope of this work is to empirically show that a domain-specific wordembedding is able to extract a better summary with respect to a general one. The ideabehind is that training a word embedding with documents of the same topic, will pro-duce a better representation of all the words related to that argument, because, withrespect to a non specific text, they are present more often and used in a more spe-cific context. Other than that, a domain-specific embedding is capable to handle betterwords having multiple meanings: instead of treating each meaning with the sameweight, the one linked to the precise topic will receive more relevance. This thesis issplit mainly in two parts: the first one is about producing a domain-specific word em-bedding and judging its quality; the second one is about applying the previous resultto a downstream task, the multi-document summarization indeed.

Relatori: Luca Cagliero, Paolo Garza
Anno accademico: 2019/20
Tipo di pubblicazione: Elettronica
Numero di pagine: 86
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/14350
Modifica (riservato agli operatori) Modifica (riservato agli operatori)