Large language models (LLMs) acquire extensive knowledge during pre-training, known as their parametric knowledge. However, in order to remain up-to-date and align with human instructions, LLMs inevitably require external knowledge during their interactions with users. This raises a crucial question: How will LLMs respond when external knowledge interferes with their parametric knowledge? To investigate this question, we propose a framework that systematically elicits LLM parametric knowledge and introduces external knowledge. Specifically, we uncover the impacts by constructing a parametric knowledge graph to reveal the different knowledge structures of LLMs, and introduce external knowledge through distractors of varying degrees, methods, positions, and formats. Our experiments on both black-box and open-source models demonstrate that LLMs tend to produce responses that deviate from their parametric knowledge, particularly when they encounter direct conflicts or confounding changes of information within detailed contexts. We also find that while LLMs are sensitive to the veracity of external knowledge, they can still be distracted by unrelated information. These findings highlight the risk of hallucination when integrating external knowledge, even indirectly, during interactions with current LLMs. All the data and results are publicly available.
翻译:大规模语言模型(LLMs)在预训练过程中获取了广泛知识,即其参数化知识。然而,为保持时效性并与人类指令对齐,LLMs在与用户交互时不可避免地需要外部知识。这引出一个关键问题:当外部知识干扰其参数化知识时,LLMs将如何响应?为探究此问题,我们提出一个系统性地激发LLM参数化知识并引入外部知识的框架。具体而言,我们通过构建参数化知识图谱来揭示LLMs的不同知识结构,并借助不同强度、方法、位置及格式的干扰因子引入外部知识,从而揭示其影响。我们对黑盒模型与开源模型的实验表明,LLMs倾向于生成偏离其参数化知识的响应,尤其在遇到直接冲突或上下文细节中信息混淆性变化时。我们还发现,尽管LLMs对外部知识的真伪性敏感,但它们仍可能被无关信息干扰。这些发现凸显了在与当前LLMs交互过程中,即使间接整合外部知识也存在幻觉风险。所有数据与结果均已公开。