Pietro Montresori
Document Intelligence with Multi-Modal Large Language Models.
Rel. Lia Morra, Fabrizio Lamberti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
Abstract: |
In modern day data-driven business environment, organizations must access and leverage diverse data sources to enhance performance, predict trends, and drive informed decision making. Artificial Intelligence (AI) advancements have further enabled the use of well-managed data to train Machine Learning models, fostering competitiveness. However, significant business data is often stored in image-based formats such as contracts, invoices, and reports containing valuable information that traditional text-based methods cannot fully extract. To address this, traditional Optical Character Recognition (OCR) is increasingly paired with Deep Learning models capable of processing both textual and visual data, capturing complex relationships that standard methods may not see. This thesis presents a Proof of Concept (PoC) of a pipeline that uses OCR and a Multi-modal Language Model to extract critical data from invoices, focusing on fields as subtotal, seller name, VAT code, and issue date. |
---|---|
Relatori: | Lia Morra, Fabrizio Lamberti |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 61 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/34018 |
Modifica (riservato agli operatori) |