
Luisa Ocleppo
From Narrative to Frames: AI-Assisted Storyboarding with personalized Diffusion Models.
Rel. Tania Cerquitelli, Bartolomeo Vacchetti. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (31MB) | Preview |
Abstract: |
This thesis presents a framework for AI-assisted storyboarding that leverages state-of-the-art text-to-image diffusion models and efficient fine-tuning techniques to generate visually coherent and narrative-consistent storyboards. The work begins with a comprehensive review of image synthesis architectures—from VAEs, to GANs to diffusion models—and explores critical components such as attention mechanisms, latent diffusion, and CLIP-based conditioning, establishing a solid technical foundation for the study. Building on this background, the thesis surveys contemporary text-to-image systems (Stable Diffusion, GLIDE, DALL-E, Imagen, MidJourney) and fine-tuning methodologies such as Dreambooth, LoRA, Textual Inversion, Custom Diffusion, ControlNet. The work then delves into storyboarding by investigating how shot types shape visual narratives and by synthesizing insights from recent approaches like StoryGAN, AR-LDM and StoryDALL-E. These findings directly inform the design of an interactive storyboard generation system that aims to maintain character consistency and shot type fidelity across frames. To achieve these goals, the proposed approach combines efficient Dreambooth LoRA fine-tuning with a targeted prompt engineering and inpainting strategy. High-quality training datasets are constructed from curated movie stills and synthetic character images to refine a pre-trained Stable Diffusion model. The interactive storyboarding system proposed in the work integrates automated prompt refinement via ChatGPT, user control mechanisms, and an inpainting-based module for post-generation adjustments, enabling iterative enhancement of storyboard frames. Experimental evaluations, including quantitative metrics and human assessments, demonstrate that the proposed method effectively preserves the stylistic characteristics of various shot types and the identity consistency of characters. Overall, this work leverages the state-of-the-art in personalized text-to-image generation to offer a practical, accessible, and open-source tool for pre-production and creative storytelling, providing a solution that bridges the gap between high-level narrative intent and detailed visual execution. |
---|---|
Relatori: | Tania Cerquitelli, Bartolomeo Vacchetti |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 171 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/35375 |
![]() |
Modifica (riservato agli operatori) |