Large language models often necessitate grounding on external knowledge to generate faithful and reliable answers. Yet even with the correct groundings in the reference, they can ignore them and rely on wrong groundings or their inherent biases to hallucinate when users, being largely unaware of the specifics of the stored information, pose questions that might not directly correlate with the retrieved groundings. In this work, we formulate this knowledge alignment problem and introduce MixAlign, a framework that interacts with both the human user and the knowledge base to obtain and integrate clarifications on how the user question relates to the stored information. MixAlign employs a language model to achieve automatic knowledge alignment and, if necessary, further enhances this alignment through human user clarifications. Experimental results highlight the crucial role of knowledge alignment in boosting model performance and mitigating hallucination, with improvements noted up to 22.2% and 27.1% respectively. We also demonstrate the effectiveness of MixAlign in improving knowledge alignment by producing high-quality, user-centered clarifications.
翻译:大语言模型通常需要依托外部知识才能生成准确可靠的答案。然而,即便在参考资料中提供了正确的知识依据,当用户因不了解存储信息的具体细节而提出与检索到的知识依据未必直接相关的问题时,模型仍可能忽略这些依据,转而依赖错误的知识或自身固有偏见产生幻觉。在本文中,我们系统定义了知识对齐问题,并提出了MixAlign框架——该框架通过与人类用户和知识库的双向交互,获取并整合关于用户问题与存储信息关联性的澄清说明。MixAlign利用语言模型实现自动知识对齐,并在必要时通过人工澄清进一步强化这种对齐效果。实验结果表明,知识对齐在提升模型性能、减少幻觉方面具有关键作用,分别带来了最高22.2%和27.1%的改进。我们还通过生成高质量、以用户为中心的澄清信息,验证了MixAlign在提升知识对齐效能方面的有效性。