polito.it
Politecnico di Torino (logo)

DreamShot: Teaching Cinema Shots to Latent Diffusion Models

Tommaso Massaglia

DreamShot: Teaching Cinema Shots to Latent Diffusion Models.

Rel. Tania Cerquitelli, Bartolomeo Vacchetti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (17MB) | Preview
Abstract:

This thesis work presents a comprehensive overview of recent advancements in image synthesis models, exploring the recent developments of Diffusion Models and their finetuning. The primary contribution consists in a novel approach that utilizes recently released techniques to tackle a relatively unexplored area in the literature: generating cinema-like shots to assist in the storyboarding process. Starting from the intuition that shot types can be learned as an artistic style, a fine-tuned version of Stable Diffusion is leveraged to tailor the generation process specifically for this purpose. By utilizing a limited number of movie frames labelled with shot types and accompanied by brief descriptions, I use Dreambooth along with Low Rank Adaptation to teach a pre-trained model three specific shot types: close shot, medium shot, and long shot. Moreover, this approach is designed to run efficiently on low-power devices. The result is qualitatively more pleasing images that more closely align with the provided prompts and shot types. This improvement is then validated through a survey conducted on human subjects, in addition to an evaluation carried out using a setup similar to the one proposed in the Dreambooth paper, demonstrating an increase in both CLIP-T and DINO scores, with the latter exhibiting a significant improvement compared to the baseline. A detailed and easily reproducible method for creating a dataset for finetuning purposes is presented along with the main matter, allowing, for example, to teach a specific filmmaker style. Finally, the impact of different generation parameters on the generative process is explored and comparisons between the traditional and this method of storyboarding are shown. In this thesis work overall I show a method that produces improved output quality, increased adherence to shot types, and enhanced expressiveness in the generation of cinema-like shots, making it a valuable tool for the filmmaking industry and creative individuals alike.

Relatori: Tania Cerquitelli, Bartolomeo Vacchetti
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 102
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/27718
Modifica (riservato agli operatori) Modifica (riservato agli operatori)