Itemset-based document summarization of multilingual collections driven by pre-trained word vectors

Yifu Zhao

Itemset-based document summarization of multilingual collections driven by pre-trained word vectors.

Rel. Luca Cagliero. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2018

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (466kB) | Preview

Abstract

With the gradual progress of information technology, people become much more easier to create, store and disseminate information in electronic form. Billions of users on Internet create quintillion of bytes everyday, even though the rich information is beneficial for human beings on several levels, the amount of information and knowledge are growing exponentially, which makes people difficult of find useful information. Taking into account the efficiency of information utilization, a viable solution for getting critical information from a large collection of document is to generate readable and concise summaries containing the most relevant information automatically. Automatic summary may collect the most relevant facts and common views in several sentences, avoiding getting lost in the large set of original tests.

Text mining refers to the acquisition of valuable information an knowledge from text data, which is a method in data mining