With the continuous development of deep learning (DL), the task of multimodal dialogue emotion recognition (MDER) has recently received extensive research attention, which is also an essential branch of DL. The MDER aims to identify the emotional information contained in different modalities, e.g., text, video, and audio, in different dialogue scenes. However, existing research has focused on modeling contextual semantic information and dialogue relations between speakers while ignoring the impact of event relations on emotion. To tackle the above issues, we propose a novel Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition (DER-GCN) method. It models dialogue relations between speakers and captures latent event relations information. Specifically, we construct a weighted multi-relationship graph to simultaneously capture the dependencies between speakers and event relations in a dialogue. Moreover, we also introduce a Self-Supervised Masked Graph Autoencoder (SMGAE) to improve the fusion representation ability of features and structures. Next, we design a new Multiple Information Transformer (MIT) to capture the correlation between different relations, which can provide a better fuse of the multivariate information between relations. Finally, we propose a loss optimization strategy based on contrastive learning to enhance the representation learning ability of minority class features. We conduct extensive experiments on the IEMOCAP and MELD benchmark datasets, which verify the effectiveness of the DER-GCN model. The results demonstrate that our model significantly improves both the average accuracy and the f1 value of emotion recognition.
翻译:随着深度学习(DL)的持续发展,多模态对话情感识别任务(MDER)近年来受到广泛研究关注,这也是深度学习的重要分支。MDER旨在识别不同对话场景中文本、视频和音频等模态所含的情感信息。然而,现有研究侧重于建模上下文语义信息和说话者间的对话关系,却忽略了事件关系对情感的影响。为解决上述问题,我们提出了一种新颖的对话与事件关系感知图卷积神经网络多模态情感识别方法(DER-GCN),该方法对说话者间的对话关系进行建模,并捕捉潜在的事件关系信息。具体而言,我们构建了一个加权多关系图,以同时捕捉说话者间的依赖关系及对话中的事件关系。此外,我们引入了自监督掩码图自编码器(SMGAE),以提升特征与结构的融合表示能力。接着,我们设计了一种新型多信息变换器(MIT),用于捕捉不同关系间的相关性,从而更好地融合关系间的多变量信息。最后,我们提出了一种基于对比学习的损失优化策略,以增强少数类特征的表示学习能力。在IEMOCAP和MELD基准数据集上进行的广泛实验验证了DER-GCN模型的有效性,结果表明我们的模型在情感识别的平均准确率和f1值上均有显著提升。