Using neural network models and subject-driven text-to-image generation techniques to classify and generate cinematographic shots

Arefeh Mohammad Nejad

Using neural network models and subject-driven text-to-image generation techniques to classify and generate cinematographic shots.

Rel. Tania Cerquitelli. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Abstract

Cinematographic shots are the building blocks of visual storytelling in filmmaking. They represent the individual frames or images captured by a camera to create a sequence that forms a movie. Subject-driven text-to-image generation is a cutting-edge field at the intersection of artificial intelligence and visual arts. It involves leveraging advanced machine learning models to create images based on textual descriptions or prompts. By interpreting and understanding the context provided in the text, these models generate visual content that reflects the essence of the described subject. In particular, in tasks such as AI-assisted video editing and storyboarding, it is essential to be able to produce images with a user-specified shot type (A storyboard is a visual representation of how a story will play out, scene by scene.

It's made up of a chronological series of images, with accompanying notes)