Arousal and Valence Recognition in Videos: Comparing the Power of Traditional Machine Learning and Deep Learning Models

Martino Conversano

Arousal and Valence Recognition in Videos: Comparing the Power of Traditional Machine Learning and Deep Learning Models.

Rel. Gabriella Olmo, Gianluca Amprimo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB) | Preview

Abstract

This thesis explores the field of image/video recognition of continuous emotional states, with the goal of improving our understanding of human emotions and the role of non-verbal cues in their expression. This is a critical area of research that has numerous practical applications such as mental health, human-computer interaction, and marketing. One of the most important viewpoint on emotion recognition is the affective state, which can be described by two primary dimensions: arousal and valence. Arousal refers to the intensity or the energy level of the emotion, while valence refers to its pleasantness or unpleasantness. In further details, this thesis is focused on arousal and valence automatic recognition from video frames containing human subjects' faces, by applying machine learning and deep learning techniques.

The purpose of this study is to compare performance between simpler models (e.g., SVM, MLP) and deep learning architectures (e.g., Resnet, VGG, MobileNet) to appreciate whether simpler models could produce comparable performance in the task, given an effective preprocessing of the input data