Tianming Qu
A Multimodal Encoder of Music and Image for Valence Arousal Prediction.
Rel. Giuseppe Rizzo, Luca Barco, Angelica Urbanelli. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract: |
Emotion analysis, a fundamental component of human-computer interaction, influences various domains, including content recommendation, image generation, and psychological research. Images and music, as crystallizations of human culture, inherently carry the emotions embedded by their creators. Analyzing the emotions conveyed in these works has long been a prominent direction of exploration in the field. Recent research in emotion analysis can be broadly categorized into two main streams: emotion label classification and valence-arousal prediction. My work primarily focuses on valence-arousal prediction. Valence represents the pleasure or displeasure elicited by a stimulus, while arousal indicates the degree of excitement or calmness. Both these metrics are crucial for the expression of human emotions. In recent years, with the rapid development of computer vision research, people have made breakthroughs in image and audio analysis. At the same time, multimedia applications that combine music and images have become increasingly popular, from advertising to movies to virtual reality experiences. Multi-modal analysis holds great promise in these contexts. In this context, my research endeavors to construct a multi-modal emotion prediction model employing metric learning. Throughout the experiments, I compare two different architectures for the encoders, one based on CNN (i.e. ResNet) and one based on more recent transformers. Different types of training losses are also applied with the aim of not only facilitating the model to acquire a shared latent embedding space but also allowing the model to learn the label space of the corresponding modality. I assess the performance across two types of encoders under this architecture, aiming to establish a foundation for subsequent research. |
---|---|
Relators: | Giuseppe Rizzo, Luca Barco, Angelica Urbanelli |
Academic year: | 2023/24 |
Publication type: | Electronic |
Number of Pages: | 87 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | FONDAZIONE LINKS |
URI: | http://webthesis.biblio.polito.it/id/eprint/29591 |
Modify record (reserved for operators) |