polito.it
Politecnico di Torino (logo)

Machine Learning & Cinema: Color grading style classification through Vectorscopes and CNNs

Krzysztof Kleist

Machine Learning & Cinema: Color grading style classification through Vectorscopes and CNNs.

Rel. Tania Cerquitelli, Bartolomeo Vacchetti. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

Identifying a movie's genre from a single frame is challenging but possible. Genres have distinct visual elements. Action films have fast scenes, intense lighting, and dynamic angles. Horror movies use dark colors with unsettling audio. Comedies use bright, colorful frames with humor. Clothing, sets, and ambiance also provide genre clues. Convolutional Neural Networks (CNNs) play a crucial role in movie classification, including genre prediction. Early research focused on audio-visual cues for mood-based genre analysis, while later applications targeted movie trailers, improving accuracy. Movie posters have also been explored, with varying success. Deep learning models significantly enhanced accuracy. Text-based approaches, like predicting titles from plot summaries, achieved high accuracy across genres. My research aligns with these studies, emphasizing deep learning for frame-based genre classification. I aim to extend this work by exploring the relationship between genre identification and movie frames. Additionally, I'll investigate the potential of vectorscope images to enhance model performance, contributing to genre classification advancements. Obtaining the right dataset was crucial. Since online sources with labeled movie data had copyright restrictions, I compiled my dataset. It came from a repository of 3,000 movies, 30,000 captioned clips, and 1,000 hours of video. I extracted relevant data, and created a table with 27,077 rows, including videoid and genre. Using Python libraries like pafy and OpenCV, I refined it, filtering videos down to 17,724 rows. I also detected grayscale videos by examining pixel variations, leaving a dataset of 16,263 rows, without grayscale frames and suitable for analysis. I obtained 16,263 rows across 23 genres after refining the dataset, but it had an imbalanced distribution. To address this, I focused on the eight most common genres and allowed movies to have up to two labels. Data collection involved using the youtube_dl library, which enabled efficient frame extraction by parallelizing the process and avoiding grayscale videos. The dataset used binary encoding for genres, supporting multi-label classification. For the train-test split, I ensured genre balance in both subsets and conducted visual analysis. I also implemented two splitting approaches: random and separate movies, assessing the model's generalization. I created a script for vectorscope analysis that resizes images, converts them to arrays, and extracts color data. Data augmentation, including rotations and color adjustments, enhances model robustness. Two strategies were employed: uniform transformations for all frames and intensified augmentation for underrepresented genres. A custom class manages data loading, resizing, and normalization for CNN training. Initial experiments focused on dataset configuration, hyperparameter tuning (learning rate and overfitting detection), and computation time. AlexNet served as a benchmark for preliminary results, comparing raw image and vectorscope datasets. Two final approaches emerged: no augmentation for raw images due to performance and efficiency, and augmentation for underrepresented classes in vectorscope representations. These decisions were based on performance and computational efficiency. Final experiments involved both datasets and three models: AlexNet, VGG-16, and ResNet-50. They were also repeated with a clip-separated dataset for robustness assessment.

Relatori: Tania Cerquitelli, Bartolomeo Vacchetti
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 124
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/29472
Modifica (riservato agli operatori) Modifica (riservato agli operatori)