Large Language Models (LLMs) excel at reasoning and generation but are inherently limited by static pretraining data, resulting in factual inaccuracies and weak adaptability to new information. Retrieval-Augmented Generation (RAG) addresses this issue by grounding LLMs in external knowledge; However, the effectiveness of RAG critically depends on whether the model can adequately access relevant information. Existing RAG systems rely on a single retriever with fixed top-k selection, restricting access to a narrow and static subset of the corpus. As a result, this single-retriever paradigm has become the primary bottleneck for comprehensive external information acquisition, especially in tasks requiring corpus-level reasoning. To overcome this limitation, we propose MARAG-R1, a reinforcement-learned multi-tool RAG framework that enables LLMs to dynamically coordinate multiple retrieval mechanisms for broader and more precise information access. MARAG-R1 equips the model with four retrieval tools -- semantic search, keyword search, filtering, and aggregation -- and learns both how and when to use them through a two-stage training process: supervised fine-tuning followed by reinforcement learning. This design allows the model to interleave reasoning and retrieval, progressively gathering sufficient evidence for corpus-level synthesis. Experiments on GlobalQA, HotpotQA, and 2WikiMultiHopQA demonstrate that MARAG-R1 substantially outperforms strong baselines and achieves new state-of-the-art results in corpus-level reasoning tasks.
翻译:大型语言模型(LLMs)在推理和生成方面表现出色,但受限于静态预训练数据,导致事实准确性不足且对新信息的适应能力较弱。检索增强生成(RAG)通过将LLMs与外部知识相结合来解决这一问题;然而,RAG的有效性关键取决于模型能否充分获取相关信息。现有的RAG系统依赖单一检索器并采用固定的top-k选择策略,仅能访问语料库中狭窄且静态的子集。因此,这种单一检索器范式已成为全面获取外部信息的主要瓶颈,尤其是在需要语料库级推理的任务中。为克服这一限制,我们提出了MARAG-R1——一种基于强化学习的多工具RAG框架,使LLMs能够动态协调多种检索机制,以实现更广泛且更精确的信息获取。MARAG-R1为模型配备了四种检索工具——语义搜索、关键词搜索、过滤和聚合——并通过两阶段训练过程(监督微调与强化学习)学习如何及何时使用这些工具。该设计使模型能够交替进行推理与检索,逐步收集足够的证据以完成语料库级综合。在GlobalQA、HotpotQA和2WikiMultiHopQA上的实验表明,MARAG-R1显著优于现有强基线模型,并在语料库级推理任务中取得了新的最先进性能。