Dialogue state tracking (DST) aims to convert the dialogue history into dialogue states which consist of slot-value pairs. As condensed structural information memorizing all history information, the dialogue state in the last turn is typically adopted as the input for predicting the current state by DST models. However, these models tend to keep the predicted slot values unchanged, which is defined as state momentum in this paper. Specifically, the models struggle to update slot values that need to be changed and correct wrongly predicted slot values in the last turn. To this end, we propose MoNET to tackle state momentum via noise-enhanced training. First, the previous state of each turn in the training data is noised via replacing some of its slot values. Then, the noised previous state is used as the input to learn to predict the current state, improving the model's ability to update and correct slot values. Furthermore, a contrastive context matching framework is designed to narrow the representation distance between a state and its corresponding noised variant, which reduces the impact of noised state and makes the model better understand the dialogue history. Experimental results on MultiWOZ datasets show that MoNET outperforms previous DST methods. Ablations and analysis verify the effectiveness of MoNET in alleviating state momentum and improving anti-noise ability.
翻译:对话状态跟踪(DST)旨在将对话历史转化为由槽值对构成的对话状态。由于对话状态作为记录所有历史信息的压缩结构信息,DST模型通常采用上一轮的对话状态作为输入来预测当前状态。然而,这些模型倾向于保持预测的槽值不变,本文将其定义为状态动量。具体而言,模型难以更新需要改变的槽值,并难以纠正上一轮中错误预测的槽值。为此,我们提出MoNET,通过噪声增强训练来应对状态动量。首先,通过替换训练数据中每一轮的上一轮状态的部分槽值,对其添加噪声。然后,将加噪后的上一轮状态作为输入,学习预测当前状态,从而提升模型更新和纠正槽值的能力。此外,我们设计了一个对比上下文匹配框架,以缩小状态与其对应加噪变体之间的表示距离,从而降低加噪状态的影响,并使模型更好地理解对话历史。在MultiWOZ数据集上的实验结果表明,MoNET优于以往的DST方法。消融实验和分析验证了MoNET在缓解状态动量和提升抗噪能力方面的有效性。