Retrieval-augmented generation (RAG) improves performance on knowledge-intensive tasks but can be derailed by wrong, irrelevant, or conflicting retrieved text, causing models to rely on inaccurate evidence and cascade errors. We propose Knowledgeable-R1, a reinforcement-learning framework that explicitly trains large language models to use parametric knowledge (PK) to resist contextual interference while still exploiting external context when it is reliably helpful. Knowledgeable-R1 introduces a joint sampling scheme that generates paired responses with and without retrieval, and learns both local advantages (within each decoding regime) and global advantages under the same input to quantify when to ignore misleading context versus adopt it. We employ an asymmetric advantage transformation that amplifies exploratory behaviors toward parametric knowledge. Experiments show that Knowledgeable-R1 significantly improves robustness and reasoning accuracy in knowledge conflict scenarios and general RAG scenarios, outperforming SOTA baselines by +22.89% in counterfactual scenarios, and without degradation when the retrieved context is fully accurate.Our code are available at https://github.com/lcy80366872/knowledgeable-R1.
翻译:检索增强生成(RAG)在知识密集型任务中提升了性能,但可能因检索到错误、无关或矛盾的文本而偏离正轨,导致模型依赖不准确的证据并引发级联错误。我们提出了Knowledgeable-R1,一个强化学习框架,旨在显式训练大语言模型利用参数知识(PK)来抵抗上下文干扰,同时在外部上下文确实可靠有帮助时仍能有效利用它。Knowledgeable-R1引入了一种联合采样方案,该方案生成带检索和不带检索的成对响应,并学习相同输入下局部优势(在每个解码机制内)和全局优势,以量化何时应忽略误导性上下文或采纳之。我们采用了一种非对称优势变换,以增强向参数知识探索的行为。实验表明,Knowledgeable-R1在知识冲突场景和一般RAG场景中显著提高了鲁棒性和推理准确性,在反事实场景中优于SOTA基线+22.89%,且在检索上下文完全准确时性能无下降。我们的代码可在 https://github.com/lcy80366872/knowledgeable-R1 获取。