Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling. Favoring the contextual information is critical for retrieval-augmented generation methods, which enrich the context with up-to-date information, hoping that grounding can rectify outdated or noisy stored knowledge. We present a novel method to study grounding abilities using Fakepedia, a dataset of counterfactual texts constructed to clash with a model's internal parametric knowledge. We benchmark various LLMs with Fakepedia and then we conduct a causal mediation analysis, based on our Masked Grouped Causal Tracing (MGCT), on LLM components when answering Fakepedia queries. Within this analysis, we identify distinct computational patterns between grounded and ungrounded responses. We finally demonstrate that distinguishing grounded from ungrounded responses is achievable through computational analysis alone. Our results, together with existing findings about factual recall mechanisms, provide a coherent narrative of how grounding and factual recall mechanisms interact within LLMs.
翻译:大语言模型(LLM)在利用上下文提供的新信息方面展现出卓越能力。然而,这种上下文接地机制仍不明确,尤其在上下文信息与参数中存储的事实知识(LLM同样擅长回忆此类知识)相矛盾的情况下。优先选择上下文信息对于检索增强生成方法至关重要——此类方法通过向上下文注入最新信息,期望通过接地修正过时或有噪声的存储知识。我们提出一种新颖方法,利用Fakepedia(一种旨在与模型内部参数知识冲突的反事实文本数据集)研究接地能力。首先使用Fakepedia对多种LLM进行基准测试,随后基于掩码分组因果追踪(MGCT)方法,对LLM组件在回答Fakepedia查询时进行因果中介分析。通过该分析,我们识别出接地响应与非接地响应之间的不同计算模式。最终证明,仅通过计算分析即可区分接地与非接地响应。本研究成果结合关于事实回忆机制的现有发现,为接地与事实回忆机制在LLM中的交互方式提供了连贯阐释。