Federico Lorenzo Pes
Analysis of semi-structured data based on Named Entity Recognition and Computer Vision techniques.
Rel. Luca Cagliero. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) | Preview |
Abstract
The task of extracting information from invoices is highly recurrent, for this reason, it is optimal to be automated. The main challenge with this task is that for each issuer the text layout of the invoice may vary. We refer to this type of data as semi-structured. Hence, while rule-based techniques may provide excellent results for a certain layout, they need to be manually adapted to a specific case. In Natural Language Processing NLP this task can be linked to the Named Entity Recognition (NER) task, which is a token classification task dedicated to detecting and classifying one or more tokens into a label corresponding to an entity in the real-world.
While word embedding and transformer-based techniques rule the landscape of NLP, they suffer with this type of data, since they do not only depend on the context of each word but also on the document’s structure
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
