Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy. While recent works equip reinforcement learning (RL)-based LRMs with retrieval capabilities, they suffer from overthinking and lack robustness in reasoning, reducing their effectiveness in question answering (QA) tasks. To address this, we propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations. Our solution includes a novel data construction framework with an upper bound on the reasoning chain length. Specifically, we first leverage an LRM to generate deliberate thinking, then select an action from a predefined action space (Search and Finish). For Search action, a query is executed against the RAG engine, where the result is returned as observation to guide reasoning steps later. This process iterates until a Finish action is chosen. Benefiting from ReaRAG's strong reasoning capabilities, our approach outperforms existing baselines on multi-hop QA. Further analysis highlights its strong reflective ability to recognize errors and refine its reasoning trajectory. Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG).
翻译:大型推理模型(LRMs)展现出卓越的推理能力,但主要依赖参数化知识,限制了事实准确性。尽管近期研究为基于强化学习(RL)的LRMs配备了检索能力,但这些方法存在过度思考问题,且推理过程缺乏鲁棒性,降低了其在问答(QA)任务中的有效性。为解决此问题,我们提出了ReaRAG,一种事实性增强的推理模型,它能够探索多样化查询而无需过多迭代。我们的解决方案包括一个新颖的数据构建框架,该框架对推理链长度设定了上限。具体而言,我们首先利用LRM生成审慎思考,然后从预定义的动作空间(搜索与完成)中选择一个动作。对于搜索动作,系统会针对RAG引擎执行查询,返回的结果将作为后续推理步骤的观察指导。此过程迭代进行,直至选择完成动作为止。得益于ReaRAG强大的推理能力,我们的方法在多跳问答任务上超越了现有基线。进一步的分析突显了其强大的反思能力,能够识别错误并优化推理轨迹。本研究在有效整合鲁棒推理与检索增强生成(RAG)的同时,显著提升了LRMs的事实准确性。