Andrea Detommaso
Semantics-Aware VQA, a scene-graph-based approach to enable commonsense reasoning.
Rel. Elena Maria Baralis, Andrea Pasini. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) | Preview |
Abstract: |
In this work we investigate a novel approach to the Visual Question Answering task which consist in answering a question expressed in natural language about the content of an image. Most of the existing works have a standard limitation. They are bounded by the poor ability of reasoning about the image’s context. To tackle this issue, we consider the use of scene graphs derived from images. Scene graphs are a synthetic representation of an image where graph nodes represent objects entities and graph edges show their object relationships. Furthermore, we investigate the use of an ontological knowledge base as a way of improving the reasoning capacity of the system. The dataset used for our experiments is Visual7W, a collection of 40’000 questions related to Visual Genome dataset, whose pictures are enriched with scene graphs and further annotations. Our empirical studies show how scene graphs can enhance the reasoning capacity of the system, especially in spatial terms. Moreover, our work shows how the usage of an external knowledge base improves the ability of our system to infer the image context. |
---|---|
Relatori: | Elena Maria Baralis, Andrea Pasini |
Anno accademico: | 2020/21 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 80 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/16672 |
Modifica (riservato agli operatori) |