UF-AMA: A unified framework for cross-domain emotion recognition via adaptive multimodal alignment

In recent years, emotion recognition based on physiological signals such as electroencephalogram (EEG) has gained considerable attention, as internal physiological data offer greater objectivity and reliability compared to external behavioral data like facial expressions. However, due to distribution shifts caused by individual and contextual differences, along with variations in sample quality across modalities, constructing a cross-domain multimodal emotion recognition model with high generalization and robustness remains a key challenge. In this study, we propose a Unified Framework with Adaptive Multimodal Alignment (UF-AMA) to address cross-subject and cross-session emotion recognition using multimodal physiological signals. First, we construct a cross-modal feature fusion network comprising Transformer encoders and multi-head cross-attention modules, enabling the deep integration of EEG signals and eye-tracking data. Subsequently, we introduce a confidence-aware screening mechanism that dynamically assesses the predictive reliability of each modality branch on target domain samples, partitions samples into different quality subsets, and accordingly applies global consistency alignment and cross-modal distillation. Finally, we propose a multi-level domain adaptation framework that jointly optimizes the marginal and conditional distributions of both local modality-specific and global fusion features, thereby reducing cross-domain distribution shifts at multiple granularities. Extensive experiments on the SEED and SEED-IV datasets demonstrate that UF-AMA achieves state-of-the-art (SOTA) performance in both cross-subject and cross-session tasks. The source code is available at: https://github.com/BetterCoderLab/UF-AMA.

翻译：近年来，基于脑电图（EEG）等生理信号的情感识别受到了广泛关注，因为与面部表情等外部行为数据相比，内部生理数据具有更强的客观性和可靠性。然而，由于个体和情境差异导致的分布偏移，以及各模态样本质量的差异，构建具有高泛化性和鲁棒性的跨域多模态情感识别模型仍是一项关键挑战。在本研究中，我们提出了一种具有自适应多模态对齐的统一框架（UF-AMA），用于基于多模态生理信号的跨被试和跨会话情感识别。首先，我们构建了一个由Transformer编码器和多头交叉注意力模块组成的跨模态特征融合网络，实现了EEG信号和眼动追踪数据的深度整合。随后，我们引入了一种置信度感知筛选机制，该机制动态评估每个模态分支在目标域样本上的预测可靠性，将样本划分为不同的质量子集，并相应地施加全局一致性对齐和跨模态蒸馏。最后，我们提出了一种多级域自适应框架，该框架联合优化了局部模态特定特征和全局融合特征的边缘分布和条件分布，从而在多个粒度上减少跨域分布偏移。在SEED和SEED-IV数据集上的大量实验表明，UF-AMA在跨被试和跨会话任务中均达到了最先进的（SOTA）性能。源代码可在以下网址获取：https://github.com/BetterCoderLab/UF-AMA。