Large language models (LLMs) have demonstrated impressive capabilities in storing and recalling factual knowledge, but also in adapting to novel in-context information. Yet, the mechanisms underlying their in-context grounding remain unknown, especially in situations where in-context information contradicts factual knowledge embedded in the parameters. This is critical for retrieval-augmented generation methods, which enrich the context with up-to-date information, hoping that grounding can rectify the outdated parametric knowledge. In this study, we introduce Fakepedia, a counterfactual dataset designed to evaluate grounding abilities when the parametric knowledge clashes with the in-context information. We benchmark various LLMs with Fakepedia and discover that GPT-4-turbo has a strong preference for its parametric knowledge. Mistral-7B, on the contrary, is the model that most robustly chooses the grounded answer. Then, we conduct causal mediation analysis on LLM components when answering Fakepedia queries. We demonstrate that inspection of the computational graph alone can predict LLM grounding with 92.8% accuracy, especially because few MLPs in the Transformer can predict non-grounded behavior. Our results, together with existing findings about factual recall mechanisms, provide a coherent narrative of how grounding and factual recall mechanisms interact within LLMs.
翻译:大型语言模型(LLMs)在存储和回忆事实知识方面表现出色,同时也能适应新颖的上下文信息。然而,其上下文接地背后机制仍不明确,尤其在上下文信息与参数中嵌入的事实知识相矛盾时。这对检索增强生成方法至关重要——此类方法通过用最新信息丰富上下文,期望接地能修正过时的参数化知识。本研究引入Fakepedia,一个反事实数据集,旨在评估参数知识与上下文信息冲突时的接地能力。我们利用Fakepedia对多种LLM进行基准测试,发现GPT-4-turbo强烈偏好其参数化知识,而Mistral-7B则是选择接地答案最稳健的模型。随后,我们在模型回答Fakepedia查询时对其组件开展因果中介分析。结果表明,仅通过检查计算图即可预测LLM的接地行为,准确率达92.8%,尤其是Transformer中少数MLP能预测非接地行为。我们的结果结合现有关于事实回忆机制的发现,为接地与事实回忆机制在LLM内部的交互作用提供了连贯的叙述。