polito.it
Politecnico di Torino (logo)

GETALP-MISTRAL7B: A Clinical Large Language Model For Automated Discharge Documentation From Electronic Health Records

Michele Pantaleo

GETALP-MISTRAL7B: A Clinical Large Language Model For Automated Discharge Documentation From Electronic Health Records.

Rel. Gabriella Olmo, Didier Schwab, Lorraine Goeuriot. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

[img] PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (12MB)
Abstract:

This work presents GETALP-Mistral7B, a clinical large language model (LLM) designed to automatically generate discharge documentation. Leveraging patients’ Electronic Health Records (EHRs), the model generates two central sections of a discharge summary: the Hospital Course (HC) and the Discharge Instructions (DI). EHRs are usually stored in forms or tables that differ across hospitals. To ensure interoperability across heterogeneous systems, EHRs were transformed into two task-specific textual formats: the Diary for generating the Hospital Course, and the Patient Summary for producing the Discharge Instructions. GETALP-Mistral7B is fine-tuned from Asclepius-Mistral-7B using 104,528 encounters from the Beth Israel Deaconess Medical Center (MIMIC-IV). Quantized Low-Rank Adaptation (QLoRA) is used to fine-tune the model separately for each section, yielding two specialized lightweight adapters while keeping the base model weights frozen. GETALP-Mistral7B is benchmarked against models from the first shared task on clinical text generation: Discharge-Me!. Evaluation is conducted using the challenge’s framework, which consists of a held-out set of 250 examples and eight NLP metrics assessing lexical similarity, semantic adequacy, and factual correctness. GETALP-Mistral7B achieves an overall score of 0.398, establishing it as the state-of-the-art for generating discharge documentation.

Relatori: Gabriella Olmo, Didier Schwab, Lorraine Goeuriot
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 134
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: INSTITUT NATIONAL POLYTECHNIQUE DE GRENOBLE (INPG) - ENSIMAG (FRANCIA)
Aziende collaboratrici: Université Grenoble Alpes (UGA)
URI: http://webthesis.biblio.polito.it/id/eprint/37897
Modifica (riservato agli operatori) Modifica (riservato agli operatori)