polito.it
Politecnico di Torino (logo)

From Narrative to Frames: AI-Assisted Storyboarding with personalized Diffusion Models

Luisa Ocleppo

From Narrative to Frames: AI-Assisted Storyboarding with personalized Diffusion Models.

Rel. Tania Cerquitelli, Bartolomeo Vacchetti. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (31MB) | Preview
Abstract:

This thesis presents a framework for AI-assisted storyboarding that leverages state-of-the-art text-to-image diffusion models and efficient fine-tuning techniques to generate visually coherent and narrative-consistent storyboards. The work begins with a comprehensive review of image synthesis architectures—from VAEs, to GANs to diffusion models—and explores critical components such as attention mechanisms, latent diffusion, and CLIP-based conditioning, establishing a solid technical foundation for the study. Building on this background, the thesis surveys contemporary text-to-image systems (Stable Diffusion, GLIDE, DALL-E, Imagen, MidJourney) and fine-tuning methodologies such as Dreambooth, LoRA, Textual Inversion, Custom Diffusion, ControlNet. The work then delves into storyboarding by investigating how shot types shape visual narratives and by synthesizing insights from recent approaches like StoryGAN, AR-LDM and StoryDALL-E. These findings directly inform the design of an interactive storyboard generation system that aims to maintain character consistency and shot type fidelity across frames. To achieve these goals, the proposed approach combines efficient Dreambooth LoRA fine-tuning with a targeted prompt engineering and inpainting strategy. High-quality training datasets are constructed from curated movie stills and synthetic character images to refine a pre-trained Stable Diffusion model. The interactive storyboarding system proposed in the work integrates automated prompt refinement via ChatGPT, user control mechanisms, and an inpainting-based module for post-generation adjustments, enabling iterative enhancement of storyboard frames. Experimental evaluations, including quantitative metrics and human assessments, demonstrate that the proposed method effectively preserves the stylistic characteristics of various shot types and the identity consistency of characters. Overall, this work leverages the state-of-the-art in personalized text-to-image generation to offer a practical, accessible, and open-source tool for pre-production and creative storytelling, providing a solution that bridges the gap between high-level narrative intent and detailed visual execution.

Relatori: Tania Cerquitelli, Bartolomeo Vacchetti
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 171
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/35375
Modifica (riservato agli operatori) Modifica (riservato agli operatori)