polito.it
Politecnico di Torino (logo)

Document Intelligence with Multi-Modal Large Language Models

Pietro Montresori

Document Intelligence with Multi-Modal Large Language Models.

Rel. Lia Morra, Fabrizio Lamberti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Abstract:

In modern day data-driven business environment, organizations must access and leverage diverse data sources to enhance performance, predict trends, and drive informed decision making. Artificial Intelligence (AI) advancements have further enabled the use of well-managed data to train Machine Learning models, fostering competitiveness. However, significant business data is often stored in image-based formats such as contracts, invoices, and reports containing valuable information that traditional text-based methods cannot fully extract. To address this, traditional Optical Character Recognition (OCR) is increasingly paired with Deep Learning models capable of processing both textual and visual data, capturing complex relationships that standard methods may not see. This thesis presents a Proof of Concept (PoC) of a pipeline that uses OCR and a Multi-modal Language Model to extract critical data from invoices, focusing on fields as subtotal, seller name, VAT code, and issue date.

Relatori: Lia Morra, Fabrizio Lamberti
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 61
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/34018
Modifica (riservato agli operatori) Modifica (riservato agli operatori)