Structural-Semantic Dynamic Graph Learning for Document Visual QA

Xiao Huan

Structural-Semantic Dynamic Graph Learning for Document Visual QA.

Rel. Luca Cagliero, Lorenzo Vaiani, Davide Napolitano. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Abstract

With the advancements in Natural Language Processing (NLP) and Computer Vision (CV), Document Visual Question Answering (Document VQA) has become an important research area both in industry and academia. Visual documents refer to documents containing various elements, such as images, tables, text paragraphs, and graphs. The challenge arises due to their multimodal nature and complex structure, where text and images must be processed together, often spanning multiple pages. Traditional question answering techniques are primarily designed for text-only or image-only inputs, making them ineffective when questions that require both text and visual elements. Even when these modalities are integrated, gaps can remain in how they interact and align.

Some models have focused on capturing relations to handle the complex structure of documents, but these approaches are limited to intra-page relationships and rely on static weight aggregation for nodes

Tipo di pubblicazione

Elettronica

URI

https://webthesis.biblio.polito.it/id/eprint/35233

Modifica (riservato agli operatori)