In terms of human-computer interaction, it is becoming more and more important to correctly understand the user's emotional state in a conversation, so the task of multimodal emotion recognition (MER) started to receive more attention. However, existing emotion classification methods usually perform classification only once. Sentences are likely to be misclassified in a single round of classification. Previous work usually ignores the similarities and differences between different morphological features in the fusion process. To address the above issues, we propose a two-stage emotion recognition model based on graph contrastive learning (TS-GCL). First, we encode the original dataset with different preprocessing modalities. Second, a graph contrastive learning (GCL) strategy is introduced for these three modal data with other structures to learn similarities and differences within and between modalities. Finally, we use MLP twice to achieve the final emotion classification. This staged classification method can help the model to better focus on different levels of emotional information, thereby improving the performance of the model. Extensive experiments show that TS-GCL has superior performance on IEMOCAP and MELD datasets compared with previous methods.
翻译:在人机交互领域,准确理解对话中用户的情感状态日益重要,因此多模态情感识别任务开始受到更多关注。然而,现有情感分类方法通常仅执行单次分类,句子在单轮分类中易被误判。先前工作通常忽略了融合过程中不同形态特征间的共性与差异。针对上述问题,我们提出了一种基于图对比学习的双阶段情感识别模型(TS-GCL)。首先,我们对原始数据集采用不同预处理模态进行编码。其次,针对这三种具有其他结构的模态数据,引入图对比学习策略以学习模态内部及模态间的相似性和差异性。最后,我们通过两次使用MLP实现最终的情感分类。这种分阶段分类方法有助于模型更好地聚焦于不同层级的情感信息,从而提升模型性能。大量实验表明,与先前方法相比,TS-GCL在IEMOCAP和MELD数据集上展现出优越性能。