Multimodal Emotion Recognition in Conversations (ERC) is a typical multimodal learning task in exploiting various data modalities concurrently. Prior studies on effective multimodal ERC encounter challenges in addressing modality imbalances and optimizing learning across modalities. Dealing with these problems, we present a novel framework named Ada2I, which consists of two inseparable modules namely Adaptive Feature Weighting (AFW) and Adaptive Modality Weighting (AMW) for feature-level and modality-level balancing respectively via leveraging both Inter- and Intra-modal interactions. Additionally, we introduce a refined disparity ratio as part of our training optimization strategy, a simple yet effective measure to assess the overall discrepancy of the model's learning process when handling multiple modalities simultaneously. Experimental results validate the effectiveness of Ada2I with state-of-the-art performance compared to baselines on three benchmark datasets, particularly in addressing modality imbalances.
翻译:多模态对话情感识别(ERC)是一种典型的多模态学习任务,旨在同时利用多种数据模态。现有高效多模态ERC研究在处理模态不平衡及优化跨模态学习方面面临挑战。针对这些问题,我们提出名为Ada2I的新型框架,该框架包含自适应特征加权(AFW)与自适应模态加权(AMW)两个不可分割的模块,通过利用模态间与模态内交互分别实现特征级与模态级的平衡。此外,我们在训练优化策略中引入改进的差异比率,这是一种评估模型同时处理多模态时学习过程整体差异性的简洁而有效的度量方法。实验结果表明,在三个基准数据集上,Ada2I相比基线模型展现出最先进的性能,尤其在解决模态不平衡问题方面具有显著有效性。