Understanding Parametric Knowledge Injection in Retrieval-Augmented Generation

Context-grounded generation underpins many LLM applications, including long-document question answering (QA), conversational personalization, and retrieval-augmented generation (RAG). However, classic token-based context concatenation is costly for long inputs and can be lost in the middle at extreme context lengths. Recent work explores context parameterization, which encodes context into lightweight trainable parameters (e.g., LoRA adapters) injected into a frozen LLM. Extending this idea to retrieved evidence yields parametric RAG (P-RAG), which incorporates knowledge via parameter updates rather than token-level attention. In this paper, we present a systematic study of this emerging RAG paradigm-parametric knowledge injection. First, we reassess P-RAG under answer-presence accuracy and show that it does not consistently outperform standard token-based RAG (T-RAG), while combining both (PT-RAG) achieves the best overall performance. Second, we introduce a QA benchmark with up-to-date knowledge beyond the LLM's internal memory to enable controlled analysis. Our representational and mechanistic results indicate that parametric representations capture document-level semantics and primarily influence deeper feed-forward computations, providing high-level guidance but limited evidence consolidation. Finally, we evaluate parametric injection under key RAG challenges, demonstrating improved faithfulness under knowledge conflicts, stronger robustness to retrieval noise, and solid generalization to tasks beyond QA. Our findings clarify the strengths and limitations of parametric RAG and provide practical guidance for future retrieval-augmented LLM systems.

翻译：基于上下文的生成支撑着许多大型语言模型（LLM）应用，包括长文档问答（QA）、对话个性化以及检索增强生成（RAG）。然而，经典的基于令牌的上下文拼接方法对于长输入成本高昂，且在极端上下文长度下信息可能在模型中部丢失。近期研究探索了上下文参数化方法，该方法将上下文编码为轻量级可训练参数（例如LoRA适配器）并注入到冻结的LLM中。将这一思路扩展到检索证据上，便产生了参数化RAG（P-RAG），它通过参数更新而非令牌级注意力机制来整合知识。本文对这一新兴的RAG范式——参数化知识注入——进行了系统性研究。首先，我们在答案存在准确率下重新评估P-RAG，结果表明其性能并未持续优于标准的基于令牌的RAG（T-RAG），而将两者结合（PT-RAG）则能实现最佳整体性能。其次，我们引入了一个包含超出LLM内部记忆的最新知识的QA基准，以实现可控分析。我们的表征和机制分析结果表明，参数化表征捕获了文档级语义，并主要影响更深层的前馈计算，提供高层级指导但证据整合能力有限。最后，我们在关键RAG挑战下评估参数化注入，证明了其在知识冲突下具有更高的忠实度、对检索噪声更强的鲁棒性，以及向QA以外任务的良好泛化能力。我们的研究结果阐明了参数化RAG的优势与局限，并为未来检索增强的LLM系统提供了实用指导。