Emotion recognition in conversation (ERC) has emerged as a research hotspot in domains such as conversational robots and question-answer systems. How to efficiently and adequately retrieve contextual emotional cues has been one of the key challenges in the ERC task. Existing efforts do not fully model the context and employ complex network structures, resulting in limited performance gains. In this paper, we propose a novel emotion recognition network based on curriculum learning strategy (ERNetCL). The proposed ERNetCL primarily consists of temporal encoder (TE), spatial encoder (SE), and curriculum learning (CL) loss. We utilize TE and SE to combine the strengths of previous methods in a simplistic manner to efficiently capture temporal and spatial contextual information in the conversation. To ease the harmful influence resulting from emotion shift and simulate the way humans learn curriculum from easy to hard, we apply the idea of CL to the ERC task to progressively optimize the network parameters. At the beginning of training, we assign lower learning weights to difficult samples. As the epoch increases, the learning weights for these samples are gradually raised. Extensive experiments on four datasets exhibit that our proposed method is effective and dramatically beats other baseline models.
翻译:对话情感识别(ERC)已成为对话机器人和问答系统等领域的研究热点。如何高效且充分地获取上下文情感线索一直是ERC任务的关键挑战之一。现有方法未能充分建模上下文,且采用复杂网络结构,导致性能提升有限。本文提出一种基于课程学习策略的新型情感识别网络(ERNetCL)。该网络主要由时间编码器(TE)、空间编码器(SE)和课程学习(CL)损失函数构成。我们以简洁的方式结合TE与SE来融合现有方法的优势,从而高效捕捉对话中的时空上下文信息。为缓解情感转移带来的有害影响,并模拟人类从易到难的课程学习模式,我们将CL思想应用于ERC任务,逐步优化网络参数。训练初期,我们为困难样本分配较低的学习权重;随着训练轮次增加,这些样本的学习权重逐渐提高。在四个数据集上的大量实验表明,所提方法性能优越,显著优于其他基线模型。