Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses. However, current knowledge-grounded dialogue (KGD) systems often fail to align the generated responses with human-preferred qualities due to several issues like hallucination and the lack of coherence. Upon analyzing multiple language model generations, we observe the presence of alternative generated responses within a single decoding process. These alternative responses are more faithful and exhibit a comparable or higher level of relevance to prior conversational turns compared to the optimal responses prioritized by the decoding processes. To address these challenges and driven by these observations, we propose Polished \& Informed Candidate Scoring (PICK), a generation re-scoring framework that empowers models to generate faithful and relevant responses without requiring additional labeled data or model tuning. Through comprehensive automatic and human evaluations, we demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history. Furthermore, PICK consistently improves the system's performance with both oracle and retrieved knowledge in all decoding strategies. We provide the detailed implementation in https://github.com/bryanwilie/pick .
翻译:将对话响应生成与外部知识结合,旨在产生信息丰富且引人入胜的回答。然而,当前的知识驱动对话系统常因幻觉、连贯性缺失等问题,难以使生成的响应符合人类偏好。通过分析多种语言模型生成结果,我们发现在单次解码过程中存在替代性生成响应。相较于解码过程优先选择的最优响应,这些替代响应更具忠实性,且与前序对话轮次的相关性相当甚至更高。基于上述观察,为应对这些挑战,我们提出精炼与信息增强候选评分(PICK)——一种无需额外标注数据或模型调优的生成重评分框架,使模型能够生成忠实且相关的响应。通过全面的自动评估与人工评估,我们证明了PICK在保持响应与对话历史相关性的同时,能更有效地提升其忠实度。此外,PICK在所有解码策略中均能持续提升系统在预言知识与检索知识下的表现。详细实现请见https://github.com/bryanwilie/pick。