Politecnico di Torino (logo)

Fine-grained Named Entity Recognition using Ontology-Guided Knowledge Graphs

Hadi Nejabat

Fine-grained Named Entity Recognition using Ontology-Guided Knowledge Graphs.

Rel. Andrea Bottino. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (12MB) | Preview

The field of Fine-Grained Named Entity Recognition (FG-NER) has received noticeable attention in recent years. Scientific literature, academia, and real-world analysis need fine-grained NER tools to be able to categorize and process a wide range of information and semantics. With the advent of transformer models in the field of NLP, several studies have shown a considerable rise in the performance of transformer-based NER models compared to their prior state-of-the-art methods. In consonance with literature, many of the most performant NER models in this field are limited to coarse-grained entity labels, with fewer than 10 categories. And there is limited research work on classifying named entities into finer and more detailed subgroups. These labels are far from enough for downstream tasks like improving automated QA systems, powering recommender systems, etc. Moreover, a series of studies dedicated to this scope indicate that fine-grained NER mostly suffers from a lack of sufficient training data. The work proposed for this thesis project is also motivated by the fact that although FG-NER still generates a lot of coverage in research, it usually lacks the flexibility to produce convincing results in newly introduced domains. Furthermore, the literature is focused on KB-matching and distant-supervised learning, which rely on the use of heavy models to partially solve problems.  We highlight Ontology Guided Named Entity Recognition model (OG-NER), a framework that is capable of improving FG-NER using its ontology guided technique. It introduces a new way to utilize the power of knowledge graphs to adequately leverage the gathering of new named entities from knowledge bases and perform semi-supervised learning to avoid the challenges mentioned, with the additional goal of training a model that can be easily re-trained and fine-tuned with limited preemptive human effort regarding training data annotation, which is an expensive task. The system is trained both on a specific domain of educational background and also on the benchmark open domain of newswire and the results are compared with two of the most cited SotA NER models.

Relators: Andrea Bottino
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 96
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Anna & Hubert Labs AB
URI: http://webthesis.biblio.polito.it/id/eprint/26827
Modify record (reserved for operators) Modify record (reserved for operators)