Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models typically operate from a single, unchallenged perspective, limiting their ability to conduct deep, self-correcting reasoning over external documents, and (2) existing training paradigms rely excessively on outcome-oriented rewards, which provide insufficient signal for shaping the complex, multi-step reasoning process. To address these issues, we propose an Reasoner-Verifier framework named Adversarial Reasoning RAG (ARR). The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage that requires no external scoring model. This reward combines explicit observational signals with internal model uncertainty to jointly optimize reasoning fidelity and verification rigor. Experiments on multiple benchmarks demonstrate the effectiveness of our method.
翻译:近期,将大型推理模型(LRMs)与检索增强生成(RAG)协同融合的研究已展现出有前景的成果,但仍存在两个关键挑战:(1)推理模型通常基于单一且未经挑战的视角运作,限制了其对外部文档进行深度、自我修正推理的能力;(2)现有的训练范式过度依赖结果导向的奖励,这些奖励对于塑造复杂、多步骤的推理过程所提供的信号不足。为解决这些问题,我们提出了一种名为对抗性推理RAG(ARR)的推理器-验证器框架。推理器与验证器在检索到的证据上进行推理,并相互批判对方的逻辑,同时受到无需外部评分模型的过程感知优势的引导。该奖励将显式的观测信号与内部模型不确定性相结合,以共同优化推理的保真度与验证的严谨性。在多个基准测试上的实验证明了我们方法的有效性。