polito.it
Politecnico di Torino (logo)

Improving financial Question Answering via Generative Artificial Intelligence and embedding optimisation

Bruno Spaccavento

Improving financial Question Answering via Generative Artificial Intelligence and embedding optimisation.

Rel. Luca Cagliero. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

[img] PDF (Tesi_di_laurea) - Tesi
Accesso riservato a: Solo utenti staff fino al 24 Ottobre 2026 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB)
Abstract:

The automatic analysis of financial reports and regulatory disclosures published by international banks is a complex task, due to the size, structure, and technical language of these documents. This thesis presents a system that uses artificial intelligence to answer questions about such documents in natural language, helping extract important financial information such as CET1 ratios, total assets, and classifications of assets and liabilities. The approach follows a two-step strategy. First, each document is divided into smaller parts, called "chunks", which are turned into numerical representations and stored in a database. When a user asks a question, the system searches this database to find the most relevant chunks. Then, a language model uses the selected chunks to generate a clear and complete answer. The proposed method was evaluated on a real-world dataset and showed a significant performance improvement, with overall answer accuracy increasing from 52.0% to 75.3%. This result demonstrates the effectiveness of combining advanced language models with more informative document representations. While the system highlights the potential of generative AI in supporting financial analysis, some challenges remain. The inherent randomness of certain models can affect the reproducibility of results, and generating contextual information for each document section introduces computational costs that may be limiting for smaller companies.

Relatori: Luca Cagliero
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 71
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: INTESA SANPAOLO INNOVATION CENTER SPA
URI: http://webthesis.biblio.polito.it/id/eprint/37892
Modifica (riservato agli operatori) Modifica (riservato agli operatori)