Reinforcement learning has recently shown promise in improving retrieval-augmented generation (RAG). Despite these advances, its effectiveness in multi-hop question answering (QA) remains limited by two fundamental limitations: (i) global planning absence to structure multi-step reasoning, and (ii) unfaithful execution, which hinders effective query formulation and consistent use of retrieved evidence. We propose GlobalRAG, a reinforcement learning framework designed to enhance global reasoning in multi-hop QA. GlobalRAG decomposes questions into subgoals, coordinates retrieval with reasoning, and refines evidence iteratively. To guide this process, we introduce Planning Quality Reward and SubGoal Completion Reward, which encourage coherent planning and reliable subgoal execution. In addition, a progressive weight annealing strategy balances process-oriented and outcome-based objectives. Extensive experiments on both in-domain and out-of-domain benchmarks demonstrate that GlobalRAG significantly outperforms strong baselines while using only 8k training data (42% of the training data used by strong baselines), achieving average improvements of 14.2% in both EM and F1.
翻译:强化学习近期在改进检索增强生成(RAG)方面展现出潜力。尽管取得了这些进展,其在多跳问答(QA)中的有效性仍受限于两个基本缺陷:(i)缺乏用于构建多步推理的全局规划;(ii)执行过程不忠实,这阻碍了有效查询构建与检索证据的一致性使用。我们提出了GlobalRAG,一个旨在增强多跳问答中全局推理能力的强化学习框架。GlobalRAG将问题分解为子目标,协调检索与推理过程,并迭代优化证据。为引导此过程,我们引入了规划质量奖励和子目标完成奖励,以鼓励连贯的规划和可靠的子目标执行。此外,一种渐进式权重退火策略平衡了面向过程与基于结果的目标。在领域内和领域外基准上的大量实验表明,GlobalRAG仅使用8k训练数据(仅为强基线所用训练数据的42%)即显著优于强基线,在EM和F1指标上平均提升了14.2%。