The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared information between modalities was not processed to generate emotions. Information redundancy problem. To overcome these limitations, we propose a cross-modal fusion emotion prediction network based on vector connections. The network mainly includes two stages: the multi-modal feature fusion stage based on connection vectors and the emotion classification stage based on fused features. Furthermore, we design a supervised inter-class contrastive learning module based on emotion labels. Experimental results confirm the effectiveness of the proposed method, demonstrating excellent performance on the IEMOCAP and MELD datasets.
翻译:对话情感识别(ERC)旨在依据上下文信息识别话语的情感类别。以往的ERC方法依赖简单的连接进行跨模态融合,忽视了模态间的信息差异,导致模型无法聚焦于模态特定的情感信息。同时,模态间的共享信息未被处理以生成情感信息冗余问题。为克服这些局限,我们提出一种基于向量连接的跨模态融合情感预测网络。该网络主要包括两个阶段:基于连接向量的多模态特征融合阶段和基于融合特征的情感分类阶段。此外,我们设计了一个基于情感标签的有监督类间对比学习模块。实验结果证实了所提方法的有效性,在IEMOCAP和MELD数据集上展现了优异的性能。