The long-standing one-to-many issue of the open-domain dialogues poses significant challenges for automatic evaluation methods, i.e., there may be multiple suitable responses which differ in semantics for a given a conversational context. To tackle this challenge, we propose a novel learning-based automatic evaluation metric (CMN), which can robustly evaluate open-domain dialogues by augmenting Conditional Variational Autoencoders (CVAEs) with a Next Sentence Prediction (NSP) objective and employing Mutual Information (MI) to model the semantic similarity of text in the latent space. Experimental results on two open-domain dialogue datasets demonstrate the superiority of our method compared with a wide range of baselines, especially in handling responses which are distant to the golden reference responses in semantics.
翻译:开放域对话中存在的长期一对多问题给自动评估方法带来了巨大挑战,即对于给定的对话上下文,可能存在多个语义不同的合适回复。为应对这一挑战,我们提出了一种新颖的基于学习的自动评估指标(CMN),该方法通过将条件变分自编码器(CVAEs)与下一句预测(NSP)目标相结合,并利用互信息(MI)在隐空间中建模文本语义相似性,从而能够稳健地评估开放域对话。在两个开放域对话数据集上的实验结果表明,我们的方法相较于多种基线方法具有优越性,尤其是在处理与黄金参考回复语义差异较大的回复时表现更为突出。