Incorporating external knowledge into large language models (LLMs) has emerged as a promising approach to mitigate outdated knowledge and hallucination in LLMs. However, external knowledge is often imperfect. In addition to useful knowledge, external knowledge is rich in irrelevant or misinformation in the context that can impair the reliability of LLM responses. This paper focuses on LLMs' preferred external knowledge in imperfect contexts when handling multi-hop QA. Inspired by criminal procedural law's Chain of Evidence (CoE), we characterize that knowledge preferred by LLMs should maintain both relevance to the question and mutual support among knowledge pieces. Accordingly, we propose an automated CoE discrimination approach and explore LLMs' preferences from their effectiveness, faithfulness and robustness, as well as CoE's usability in a naive Retrieval-Augmented Generation (RAG) case. The evaluation on five LLMs reveals that CoE enhances LLMs through more accurate generation, stronger answer faithfulness, better robustness against knowledge conflict, and improved performance in a popular RAG case.
翻译:将外部知识融入大型语言模型已成为缓解其知识陈旧与幻觉问题的有效途径。然而,外部知识往往存在缺陷:除有效信息外,语境中常混杂无关内容或错误信息,可能损害语言模型响应的可靠性。本文聚焦于处理多跳问答任务时,大型语言模型在不完美语境中对特定外部知识的偏好机制。受刑事诉讼法中"证据链"概念的启发,我们提出语言模型偏好的知识应同时满足问题相关性与知识片段间的相互支撑性。基于此,我们设计了一种自动化证据链判别方法,从生成效果、忠实度、鲁棒性三个维度系统探究语言模型的偏好特性,并验证证据链在基础检索增强生成场景中的实用性。通过对五种主流语言模型的评估发现:证据链机制能显著提升生成准确性、增强答案忠实性、改善知识冲突下的鲁棒性,并在典型检索增强生成场景中有效提升模型性能。