Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model's PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.
翻译:大型语言模型(LLMs)在处理其学习到的内部参数知识(PK)与推理过程中提供的外部上下文知识(CK)时,常面临两者间的冲突。理解LLMs如何优先选择某一知识来源而非另一来源,仍是一个挑战。本文提出一种新颖的探测框架,以探究LLMs在PK与CK之间进行选择的机制。通过设计可控提示来与模型的PK形成矛盾,我们证明特定的模型激活状态能够指示其所采用的知识来源。我们在不同规模的多种LLMs上评估该框架,结果表明中间层的激活——尤其是与输入中关系相关的激活——对于预测知识来源的选择至关重要,这为构建能够有效处理知识冲突、更可靠的模型奠定了基础。