The tendency for hallucination in current large language models (LLMs) negatively impacts dialogue systems. Such hallucinations produce factually incorrect responses that may mislead users and undermine system trust. Existing refinement methods for dialogue systems typically operate at the response level, overlooking the fact that a single response may contain multiple verifiable or unverifiable facts. To address this gap, we propose Fine-Refine, a fine-grained refinement framework that decomposes responses into atomic units, verifies each unit using external knowledge, assesses fluency via perplexity, and iteratively corrects granular errors. We evaluate factuality across the HybriDialogue and OpendialKG datasets in terms of factual accuracy (fact score) and coverage (Not Enough Information Proportion), and experiments show that Fine-Refine substantially improves factuality, achieving up to a 7.63-point gain in dialogue fact score, with a small trade-off in dialogue quality.
翻译:当前大型语言模型(LLM)的幻觉倾向对对话系统产生了负面影响。此类幻觉会产生事实错误的回答,可能误导用户并损害系统可信度。现有面向对话系统的精炼方法通常在回答层面进行操作,忽视了单个回答可能包含多个可验证或不可验证事实的问题。为弥补这一不足,我们提出了Fine-Refine,一个细粒度精炼框架。该框架将回答分解为原子单元,利用外部知识验证每个单元,通过困惑度评估流畅性,并迭代修正细粒度错误。我们在HybriDialogue和OpendialKG数据集上从事实准确性(事实得分)和覆盖率(信息不足比例)两个维度评估事实性。实验表明,Fine-Refine显著提升了事实性,在对话事实得分上最高可获得7.63分的提升,且对话质量仅有小幅折损。