polito.it
Politecnico di Torino (logo)

Generative Artificial Intelligence for Insurance: developing a chatbot for Multi-Document Summarization of Insurance Products

Ivan Brancati

Generative Artificial Intelligence for Insurance: developing a chatbot for Multi-Document Summarization of Insurance Products.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img] PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (9MB)
Abstract:

Natural Language Processing is a topic widely analyzed in the computer science field and is very important today, since Generative Artificial Intelligence is driving the technological evolution. GenAI finds applications across several domains, supporting multiple NLP tasks such as Text Classification, Sentiment Analysis, and Text Generation. In particular, Multi-Document Summarization is very useful in an era full of information. The MDS objective is to automatically obtain resumes from numerous documents. It usually summarizes documents that represent news or events, but a lot of different fields could benefit from it. MDS in the economic field is still unexplored. In particular, this work faced the challenge of producing a good-quality resume of multiple documents that describe insurance products. For this reason, we adapted the Chain-of-Events approach, which has obtained amazing results in the news-related summaries, to the insurance sector, using LangChain APIs and ChatGPT. We created a new dataset of 16 insurance products from the Unipol company; its elements are documents that share a common structure, with both general sections and product-specific sections. Every document was divided into multiple chunks to reproduce a multi-document setting. The main challenge was to adapt and optimize the prompts that were structured to resume news or events. We tried both the Stuff and Map-Reduce strategies, but we finally chose Map-Reduce since it proved to be more effective. Another big challenge of the task was to find a valid methodology to evaluate the summarization results, given that traditional evaluation techniques usually applied to text generation, like BLEU or ROUGE, can only be used in a supervised setting when the dataset includes a human-generated reference text. For this reason, we utilized the G-Eval framework, exploiting LLMs as evaluators based on 4 key metrics: Coherence, Consistency, Fluency, and Relevance. For each of the metrics, we defined a dedicated prompt, assigning a score between 0 and 1, and computing their average. The final model achieved an average score higher than 0.8. After proving CoE can be adapted to the insurance sector, we provided a complete solution to apply it to a real scenario, building an interactive chatbot able to assist customers to summarize and compare insurance products. We developed the web application using the Streamlit framework, providing two sections: the Summarizer to generate a summary of documents related to a specific insurance product, and the Comparator to compare two different insurance products, highlighting their similarities and differences. The Summarizer section also includes a Retrieval-Augmented Generation system to reply to product-related questions, retrieving salient information directly from the documents and extending user possibilities. We used ChatGPT-4o Mini both to summarize documents and for conversational purposes, and ChatGPT-3.5 for the evaluation metrics. Concluding, we showed that MDS in the insurance domain is possible using CoE prompting and LLMs, offering both general examples and a concrete use case, and this involved the creation of a new dataset and a new evaluation technique. This solution can be further extended in the future, as it could also be applied to other sectors, such as finance and healthcare, or by utilizing newer LLMs, even different from OpenAI GPT. Furthermore, the dataset could be expanded, providing human-generated reference summaries and applying different evaluation techniques.

Relatori: Paolo Garza
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 135
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: DATA Reply S.r.l. con Unico Socio
URI: http://webthesis.biblio.polito.it/id/eprint/36361
Modifica (riservato agli operatori) Modifica (riservato agli operatori)