polito.it
Politecnico di Torino (logo)

Evaluating cross-domain adaptation strategies for foundation models: a study on Fluorescein Angiography

Roberto Pulvirenti

Evaluating cross-domain adaptation strategies for foundation models: a study on Fluorescein Angiography.

Rel. Filippo Molinari, Massimo Salvi, André Anjos, Oscar Jimenez. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img] PDF (Tesi_di_laurea) - Tesi
Accesso riservato a: Solo utenti staff fino al 11 Aprile 2026 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (9MB)
Abstract:

In recent years, large-scale artificial general intelligence models have achieved unprecedented success in various general-domain tasks. However, their direct application to specialized fields like medical imaging remains challenging due to the field’s inherent complexities. Unlike natural images, which feature everyday objects recognizable through common sense, medical images demand deep domain expertise for accurate interpretation. This makes the annotation process particularly challenging, as producing a large volume of high-quality labeled medical data requires the involvement of multiple experts, a resource that is often limited and difficult to scale. Moreover, while some medical imaging modalities, such as color fundus photography, benefit from extensive annotated datasets, others, like Fluorescein Angiography (FA), suffer from a severe lack of labeled data due to their specialized and less frequently used nature. This discrepancy in data availability across modalities highlights the necessity of cross-domain adaptation, leveraging knowledge from well-established imaging modalities to improve performance on underrepresented ones. Vision Transformers (ViTs) have shown significant promise in medical image analysis tasks such as reconstruction, segmentation, and classification, owing to their strong representational and generalization capabilities. Yet, their dependency on large, annotated datasets renders training from scratch impractical for medical applications. To overcome this hurdle, transfer learning from models pre-trained on large-scale datasets, such as ImageNet, has become a widely adopted approach. The success of these pre-trained ViTs, often referred to as visual foundation models, has inspired further research into adapting these architectures for more powerful, domain-specific applications. In the ophthalmology, this progress has led to the development of RETFound, a large ViT model pre-trained on nearly one million color fundus images using the Masked Image Modelling (MIM) technique, and its smaller variant, RETFoundGreen, which was pre-trained on only 75K images using a novel self-supervised technique called Token Reconstruction. The objective of this thesis is to systematically evaluate and compare different cross-domain adaptation techniques for large-scale vision models in the medical imaging domain, with a particular focus on FA. Specifically, it examines strategies for transferring knowledge to FA by comparing models pre-trained on color fundus images (e.g., RETFound and RETFoundGreen) with those pre-trained on natural images (e.g., ImageNet-trained ViTs) using three primary adaptation approaches: full fine-tuning (FFT), self-supervised pretraining, and parameter-efficient fine-tuning (PEFT). By evaluating the capabilities of these foundation models in adapting to FA images, this study assesses their robustness, efficiency, and clinical applicability across ten distinct tasks using two datasets: the publicly available “3rd Aptos Competition” dataset and the “SOIN” dataset provided by the Hospital Ophthalmic Jules-Gonin. Special emphasis is placed on self-supervised learning’s role in mitigating domain shifts, the trade-offs of full fine-tuning in terms of computational cost and catastrophic forgetting, and the effectiveness of adapter-based methods such as Low-Rank Adaptation (LoRA) in reducing the number of trainable parameters while preserving model performance.

Relatori: Filippo Molinari, Massimo Salvi, André Anjos, Oscar Jimenez
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 94
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Idiap Research Institute
URI: http://webthesis.biblio.polito.it/id/eprint/35387
Modifica (riservato agli operatori) Modifica (riservato agli operatori)