Reliance on the inherent knowledge of Large Language Models (LLMs) can cause issues such as hallucinations, lack of control, and difficulties in integrating variable knowledge. To mitigate this, LLMs can be probed to generate responses by grounding on external context, often given as input (knowledge-augmented models). Yet, previous research is often confined to a narrow view of the term "grounding", often only focusing on whether the response contains the correct answer or not, which does not ensure the reliability of the entire response. To address this limitation, we introduce a strict definition of grounding: a model is considered truly grounded when its responses (1) fully utilize necessary knowledge from the provided context, and (2) don't exceed the knowledge within the contexts. We introduce a new dataset and a grounding metric to assess this new definition and perform experiments across 13 LLMs of different sizes and training methods to provide insights into the factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.
翻译:依赖大型语言模型(LLMs)固有的知识可能导致幻觉、缺乏控制以及难以整合可变知识等问题。为解决此问题,可通过外部上下文(通常作为输入提供,即知识增强模型)对LLMs进行引导以生成基于此上下文的响应。然而,以往研究往往局限于对“知识锚定”概念的狭义理解,仅关注响应是否包含正确答案,而未能确保整个响应的可靠性。为弥补这一局限,我们引入了知识锚定的严格定义:当模型的响应(1)充分运用所提供上下文中的必要知识,且(2)不超出上下文中的知识范围时,该模型才被视为实现了真正的知识锚定。我们提出了一个新的数据集和锚定指标以评估这一定义,并在13个不同规模与训练方法的LLMs上开展实验,从而揭示影响锚定性能的关键因素。研究结果有助于更深入理解如何提升知识锚定能力,并指明了实现更可靠、更可控的LLM应用需改进的方向。