In Emotion Recognition in Conversations (ERC), the emotions of target utterances are closely dependent on their context. Therefore, existing works train the model to generate the response of the target utterance, which aims to recognise emotions leveraging contextual information. However, adjacent response generation ignores long-range dependencies and provides limited affective information in many cases. In addition, most ERC models learn a unified distributed representation for each utterance, which lacks interpretability and robustness. To address these issues, we propose a VAD-disentangled Variational AutoEncoder (VAD-VAE), which first introduces a target utterance reconstruction task based on Variational Autoencoder, then disentangles three affect representations Valence-Arousal-Dominance (VAD) from the latent space. We also enhance the disentangled representations by introducing VAD supervision signals from a sentiment lexicon and minimising the mutual information between VAD distributions. Experiments show that VAD-VAE outperforms the state-of-the-art model on two datasets. Further analysis proves the effectiveness of each proposed module and the quality of disentangled VAD representations. The code is available at https://github.com/SteveKGYang/VAD-VAE.
翻译:在对话情感识别(ERC)中,目标话语的情感高度依赖于其上下文。因此,现有工作训练模型生成目标话语的回应,旨在利用上下文信息识别情感。然而,邻近回应生成在许多情况下忽略了长距离依赖关系,且提供的情感信息有限。此外,大多数ERC模型为每个话语学习统一的分布式表示,这缺乏可解释性和鲁棒性。为解决这些问题,我们提出了一种VAD解耦变分自编码器(VAD-VAE),它首先引入基于变分自编码器的目标话语重构任务,然后从潜在空间中解耦出三种情感表示:效价-唤醒-支配(VAD)。我们还通过引入来自情感词典的VAD监督信号并最小化VAD分布之间的互信息来增强解耦表示。实验表明,VAD-VAE在两个数据集上优于最先进模型。进一步分析证明了每个提出模块的有效性以及解耦VAD表示的质量。代码已开源在https://github.com/SteveKGYang/VAD-VAE。