Multimodal multi-label emotion recognition (MMER) aims to identify the concurrent presence of multiple emotions in multimodal data. Existing studies primarily focus on improving fusion strategies and modeling modality-to-label dependencies. However, they often overlook the impact of \textbf{aleatoric uncertainty}, which is the inherent noise in the multimodal data and hinders the effectiveness of modality fusion by introducing ambiguity into feature representations. To address this issue and effectively model aleatoric uncertainty, this paper proposes Latent emotional Distribution Decomposition with Uncertainty perception (LDDU) framework from a novel perspective of latent emotional space probabilistic modeling. Specifically, we introduce a contrastive disentangled distribution mechanism within the emotion space to model the multimodal data, allowing for the extraction of semantic features and uncertainty. Furthermore, we design an uncertainty-aware fusion multimodal method that accounts for the dispersed distribution of uncertainty and integrates distribution information. Experimental results show that LDDU achieves state-of-the-art performance on the CMU-MOSEI and M$^3$ED datasets, highlighting the importance of uncertainty modeling in MMER. Code is available at https://github.com/201983290498/lddu\_mmer.git.
翻译:多模态多标签情感识别(MMER)旨在识别多模态数据中多种情感的同时存在。现有研究主要聚焦于改进融合策略和建模模态-标签依赖关系,然而往往忽视了**偶然性不确定性**的影响——这种多模态数据中固有的噪声会通过向特征表示引入歧义性,从而阻碍模态融合的有效性。为解决该问题并有效建模偶然性不确定性,本文从潜在情感空间概率建模的新视角,提出了具有不确定性感知的潜在情感分布解耦(LDDU)框架。具体而言,我们在情感空间内引入对比解缠分布机制来建模多模态数据,从而能够同时提取语义特征与不确定性。此外,我们设计了一种考虑不确定性分散分布并整合分布信息的不确定性感知多模态融合方法。实验结果表明,LDDU在CMU-MOSEI和M$^3$ED数据集上取得了最先进的性能,凸显了不确定性建模在MMER中的重要性。代码发布于https://github.com/201983290498/lddu\_mmer.git。