polito.it
Politecnico di Torino (logo)

Hybrid Neural Knowledge Graph-to-Text and Text-to-Text Generation

Marco Saponara

Hybrid Neural Knowledge Graph-to-Text and Text-to-Text Generation.

Rel. Tatiana Tommasi, Leo Wanner. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

In this document, we propose a novel Natural Language Generation (NLG) task, namely the hybrid knowledge graph-to-text and text-to-text generation. It aims at enriching the text obtained from a knowledge graph (KG) encoded in the format of the Resource Description Framework (RDF) with relevant information extrapolated from a complementary textual context. This task is particularly useful when dealing with small-sized ontologies on topics for which richer textual resources are available. In order to solve this task, we present a neural system based on a three-step pipeline: pure KG-to-text generation, content selection from the context, and, finally, the combination of the KG’s verbalization and the additional information into a fluent and cohesive textual output. Each step is based on the Transformer architecture, with the first and the third steps employing a suitably fine-tuned T5 model, and the second based on BERT. The KG-to-text generation model is fine-tuned on the WebNLG corpus; the others are trained on two custom datasets derived from the latter. The generated texts are then evaluated through the syntactic log-odds ratio, a referenceless model-dependent metric for fluency evaluation, and a questionnaire-based human evaluation on four dimensions, namely coherence, grammaticality, faithfulness, and informativeness. The generated texts overall reach good levels of grammatical correctness and informativeness, but there is room for improvement with regard to textual coherence and faithfulness.

Relatori: Tatiana Tommasi, Leo Wanner
Anno accademico: 2021/22
Tipo di pubblicazione: Elettronica
Numero di pagine: 49
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Ente in cotutela: ICREA, Catalan Institution for Research and Advanced Studies (SPAGNA)
Aziende collaboratrici: ICREA, Catalan Institution for Research and Advanced Studies
URI: http://webthesis.biblio.polito.it/id/eprint/21933
Modifica (riservato agli operatori) Modifica (riservato agli operatori)