Xiyang Zu
Cross-Domain Multimodal Emotion Recognition with Progressive Fusion and Conservative Domain Adaptation.
Rel. Giuseppe Rizzo. Politecnico di Torino, NON SPECIFICATO, 2025
|
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) |
| Abstract: |
Emotion recognition technology plays a crucial role in human-computer interaction and affective computing applications, with potential applications spanning from mental health monitoring to educational technology. However, the majority of existing emotion recognition systems suffer from significant performance degradation when deployed across different environments and datasets, limiting their practical applicability. This highlights the critical need for robust cross-domain solutions that can maintain performance consistency across varied real-world conditions. With the advancement of multimodal learning and domain adaptation techniques, attention-based fusion models have shown promising results in emotion recognition tasks. However, cross-domain emotion recognition still faces substantial challenges due to domain distribution shifts, limited computational resources, and the need to balance source and target domain performance. Existing domain adaptation methods often exhibit training instability and catastrophic forgetting, where aggressive adversarial training sacrifices source domain performance for marginal target domain improvements, restricting the practical deployment of these systems. To address these issues, this study proposes a progressive multimodal fusion architecture with ultra-conservative domain adaptation for cross-domain emotion recognition. We systematically evaluate three fusion strategies to identify the optimal multimodal integration approach, and implement a novel conservative alpha scheduling mechanism to ensure training stability. In addition, computational efficiency optimization is incorporated to enable practical deployment in resource-constrained scenarios. Specifically, we design a comprehensive fusion comparison framework with three different strategies—truly early fusion, progressive middle fusion with multi-stage cross-attention, and weighted late fusion—to examine their effectiveness in cross-domain scenarios. Furthermore, we focus on transferring knowledge from controlled laboratory settings (RAVDESS) to naturalistic environments (CREMA-D), implementing an ultra-conservative domain adaptation strategy with alpha scheduling ranging from 0.0001 to 0.02, compared to traditional methods that scale to 1.0, along with strategic target domain subset selection to reduce computational requirements by approximately 5×. The results demonstrate that progressive middle fusion combined with conservative domain adaptation significantly outperforms baseline approaches in cross-domain transfer tasks. In speaker-independent evaluation, our method maintains high source domain performance while substantially improving target domain accuracy compared to zero-shot transfer baselines. The conservative alpha scheduling strategy achieves superior training stability with notably lower loss variance compared to traditional adversarial methods, while the efficient data balancing approach reduces training time and computational overhead without compromising performance quality. These results demonstrate the strong generalization ability and computational efficiency of the proposed approach, highlighting its potential for practical cross-domain emotion recognition deployment in real-world applications where both performance consistency and resource constraints are critical considerations. |
|---|---|
| Relatori: | Giuseppe Rizzo |
| Anno accademico: | 2025/26 |
| Tipo di pubblicazione: | Elettronica |
| Numero di pagine: | 80 |
| Soggetti: | |
| Corso di laurea: | NON SPECIFICATO |
| Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
| Aziende collaboratrici: | FONDAZIONE LINKS |
| URI: | http://webthesis.biblio.polito.it/id/eprint/37873 |
![]() |
Modifica (riservato agli operatori) |



Licenza Creative Commons - Attribuzione 3.0 Italia