One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recovery utterances are authored by annotators. This covers the life span of inconsistencies, namely introduction, understanding, and resolution. Building on this, we introduce a set of tasks centered on dialogue consistency, specifically focused on its detection and resolution. Our experimental findings indicate that our dataset significantly helps the progress in identifying and resolving conversational inconsistencies, and current popular large language models like ChatGPT which are good at resolving inconsistencies however still struggle with detection.
翻译:聊天系统面临的一个关键问题是如何在偏好、观点、信念和事实方面保持一致性,这已被证明是一个难题。本研究探讨了评估和提升聊天系统话语一致性的方法。首先,我们构建了一个专门研究不一致性的数据集,其中包含由标注人员编写的不一致对话回应、不一致性解释及恢复性话语。这涵盖了一致性问题的生命周期,即引入、理解和解决。在此基础上,我们提出了一系列以对话一致性为中心的任务,重点聚焦于其检测与解决。实验结果表明,我们的数据集显著促进了对话不一致性的识别与解决能力。当前主流的大语言模型(如ChatGPT)虽擅长解决不一致问题,但在检测方面仍存在困难。