Integrating Multi-Modal Reasoning and Explainable AI for Dermatological Image Analysis via LLM-Orchestrated Toolchains

Leonardo Sgroi

Integrating Multi-Modal Reasoning and Explainable AI for Dermatological Image Analysis via LLM-Orchestrated Toolchains.

Rel. Flavio Giobergia, Ignazio Gallo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB) | Preview

Abstract:	Skin cancer is one of the most common and dangerous kinds of cancer worldwide, although its detection remains a challenge even for expert dermatologists. This thesis explores how artificial intelligence can be a trusted assistant in the diagnostic process by combining the reasoning power of Large Language Models (LLMs) with the precision of state-of-the-art vision tools. The proposed framework is a modular agent with a central reasoning core that leverages a set of specialized tools for image classification, lesion detection, patient metadata integration, and explainable AI. Through extensive experiments, the thesis evaluates the contribution of each component. First of all, the ability of multimodal language models, such as GPT-4o and Gemini, is analyzed both in classifying dermatological images with their own vision and in interacting with vision tools. Furthermore, the research focuses on the integration of patient information via text embeddings into the classification model, to understand whether this data can enhance the performance of the tool. The project also includes an evaluation on the interpretability gains given by concept-based explainability methods; by exploiting the annotations of the SkinCon dataset, made of 48 clinical concepts annotated by dermatologists, the model learns to predict these concepts before performing the final classification. This technique helps the central LLM to better communicate with the user by supporting its answers with a set of “proofs.” The results show that while general-purpose AI struggles on its own in fine-grained medical tasks, combining it with domain-specific tools significantly boosts performance and reliability. This AI agent interacts, through a ReAct loop approach, with dermatologists in natural language, providing meaningful explanations of its decisions in order to make the reasoning transparent to the user. This work demonstrates the potential of combining LLM reasoning with modular vision tools to build an effective dermatological AI assistant. The agent mimics the behavior of a clinician by delegating tasks to specialized tools and integrating their outputs into a final decision. Its modularity allows for easy expansion and integration into real-world clinical workflows.
Relatori:	Flavio Giobergia, Ignazio Gallo
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	68
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	UNIVERSITA' DEGLI STUDI DELL'INSUBRIA
URI:	http://webthesis.biblio.polito.it/id/eprint/37754

Modifica (riservato agli operatori)