polito.it
Politecnico di Torino (logo)

Language and Vision models for PET Assisted Reporting

Jacopo Bracci

Language and Vision models for PET Assisted Reporting.

Rel. Flavio Giobergia, Nicolo' Capobianco. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Abstract:

PET and CT scans are essential diagnostic tools for detecting tumors in patients and aiding doctors worldwide in making diagnoses. Over time, advanced algorithms, such as convolutional neural networks, have been introduced to automatically detect lesions, leading to the development of lesion segmentation models. While these models are now helping physicians improve diagnostic accuracy, they rely solely on images as input, thus utilizing only the visual modality. However, it is common practice for physicians to write textual reports that describe their findings. These findings include comprehensive descriptions of the identified lesions, specifying important details such as anatomical location, dimensions, and notable characteristics. Moreover, this information is reliable because it is provided by experienced physicians ensuring that the descriptions are not only accurate but also clinically relevant. Consequently, clinical reports contain rich information regarding the lesions, making textual data a valuable modality. This thesis explores the potential of combining this textual information with the visual data from PET/CT scans to develop a more accurate lesion segmentation model. Specifically, this work focuses on constructing a pipeline that, given the PET/CT scan and the corresponding clinical report, produces a segmentation of lesions. First, Large Language Models (LLMs) were employed to extract structured information from clinical reports regarding the anatomical location of the lesions. Next, organ segmentation tools were used to link the textual information with the images, bridging the gap between the text and image modalities. Finally, this information, along with the PET/CT scans, was used in combination with a segmentation model to generate a segmentation of the lesions. While this multimodal approach has potential, the results of this work highlight the current limitations of integrating textual and visual data for lesion segmentation. Challenges such as aligning the extracted information with imaging data, as well as the limitations and hallucinations of LLMs, revealed the difficulties in achieving improved segmentation accuracy. These findings underscore the need for further research to refine multimodal approaches and address these challenges before they can be effectively applied in clinical settings.

Relatori: Flavio Giobergia, Nicolo' Capobianco
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 96
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: Siemens Healthineers (GERMANIA)
Aziende collaboratrici: Siemens Healthineers AG
URI: http://webthesis.biblio.polito.it/id/eprint/33027
Modifica (riservato agli operatori) Modifica (riservato agli operatori)