Incorporating external knowledge into dialogue generation (KIDG) is crucial for improving the correctness of response, where evidence fragments serve as knowledgeable snippets supporting the factual dialogue replies. However, introducing irrelevant content often adversely impacts reply quality and easily leads to hallucinated responses. Prior work on evidence retrieval and integration in dialogue systems falls short of fully leveraging existing evidence since the model fails to locate useful fragments accurately and overlooks hidden evidence labels within the KIDG dataset. To fully Unleash the potential of evidence, we propose a framework to effectively incorporate Evidence in knowledge-Intensive Dialogue Generation (u-EIDG). Specifically, we introduce an automatic evidence generation framework that harnesses the power of Large Language Models (LLMs) to mine reliable evidence veracity labels from unlabeled data. By utilizing these evidence labels, we train a reliable evidence indicator to effectively identify relevant evidence from retrieved passages. Furthermore, we propose an evidence-augmented generator with an evidence-focused attention mechanism, which allows the model to concentrate on evidenced segments. Experimental results on MultiDoc2Dial demonstrate the efficacy of evidential label augmentation and refined attention mechanisms in improving model performance. Further analysis confirms that the proposed method outperforms other baselines (+3~+5 points) regarding coherence and factual consistency.
翻译:将外部知识融入对话生成(KIDG)对于提升回复的正确性至关重要,其中证据片段作为支持事实性对话回复的知识性片段。然而,引入无关内容往往会对回复质量产生不利影响,并容易导致产生幻觉式回复。以往对话系统中证据检索与融合的研究未能充分利用现有证据,因为模型无法准确定位有用片段,并且忽视了KIDG数据集中隐藏的证据标签。为了充分释放证据的潜力,我们提出了一个框架,以有效融入知识密集型对话生成中的证据(u-EIDG)。具体而言,我们引入了一个自动证据生成框架,利用大语言模型(LLMs)的能力从未标注数据中挖掘可靠的证据真实性标签。利用这些证据标签,我们训练了一个可靠的证据指示器,以有效从检索到的语段中识别相关证据。此外,我们提出了一种带有证据聚焦注意力机制的证据增强生成器,使模型能够集中关注证据片段。在MultiDoc2Dial上的实验结果表明,证据标签增强和精细化注意力机制在提升模型性能方面具有有效性。进一步分析证实,所提方法在连贯性和事实一致性方面优于其他基线(提升+3~+5个百分点)。