Conversational Emotion Recognition (CER) aims to predict the emotion expressed by an utterance (referred to as an ``event'') during a conversation. Existing graph-based methods mainly focus on event interactions to comprehend the conversational context, while overlooking the direct influence of the speaker's emotional state on the events. In addition, real-time modeling of the conversation is crucial for real-world applications but is rarely considered. Toward this end, we propose a novel graph-based approach, namely Event-State Interactions infused Heterogeneous Graph Neural Network (ESIHGNN), which incorporates the speaker's emotional state and constructs a heterogeneous event-state interaction graph to model the conversation. Specifically, a heterogeneous directed acyclic graph neural network is employed to dynamically update and enhance the representations of events and emotional states at each turn, thereby improving conversational coherence and consistency. Furthermore, to further improve the performance of CER, we enrich the graph's edges with external knowledge. Experimental results on four publicly available CER datasets show the superiority of our approach and the effectiveness of the introduced heterogeneous event-state interaction graph.
翻译:对话情绪识别(CER)旨在预测对话中话语(称为“事件”)所表达的情感。现有基于图的方法主要关注事件交互以理解对话语境,却忽略了说话者情绪状态对事件的直接影响。此外,对对话进行实时建模对实际应用至关重要,但鲜有研究涉及。为此,我们提出一种新颖的基于图的方法——融入事件-状态交互的异构图神经网络(ESIHGNN),该方法融合说话者情绪状态,构建异构事件-状态交互图以建模对话。具体而言,采用异构有向无环图神经网络动态更新并增强每轮对话中事件和情绪状态的表征,从而提升对话连贯性与一致性。此外,为进一步提升CER性能,我们利用外部知识丰富图的边信息。在四个公开CER数据集上的实验结果表明,本方法具有优越性,且引入的异构事件-状态交互图切实有效。