Daniele Mansillo
Multimodal RAG for Slide Presentations with Synthetic Data Generation and Anonymization.
Rel. Daniele Apiletti, Simone Monaco. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) | Preview |
Abstract
In recent years there has been a surge in the development and adoption of Retrieval-Augmented Generation (RAG) pipelines, as they constitute a cost-effective, flexible, and highly customizable way to leverage the advantages of LLMs on private and custom data. While modern RAG pipelines can work with almost any type of data, existing document processing systems focus predominantly on textual content, often ignoring visual elements. This text-centric approach may suffice when text constitutes the main information carrier, but it fails to extract all meaningful insights from documents like slide presentations, where content is equally distributed across text, charts, images, and tables that often interact with each other to convey complete information.
Given the rising popularity and performance of multimodal models and the lack of substantial integration in RAG pipelines, we chose to bridge this gap by building an effective RAG pipeline capable of processing slide presentations in PDF format and accurately responding to queries requesting information available in different data modalities
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
URI
![]() |
Modifica (riservato agli operatori) |
