MARAG-R1：超越单一检索器——基于强化学习的多工具智能检索增强生成框架 (MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval)

Large Language Models (LLMs) excel at reasoning and generation but are inherently limited by static pretraining data, resulting in factual inaccuracies and weak adaptability to new information. Retrieval-Augmented Generation (RAG) addresses this issue by grounding LLMs in external knowledge; However, the effectiveness of RAG critically depends on whether the model can adequately access relevant information. Existing RAG systems rely on a single retriever with fixed top-k selection, restricting access to a narrow and static subset of the corpus. As a result, this single-retriever paradigm has become the primary bottleneck for comprehensive external information acquisition, especially in tasks requiring corpus-level reasoning. To overcome this limitation, we propose MARAG-R1, a reinforcement-learned multi-tool RAG framework that enables LLMs to dynamically coordinate multiple retrieval mechanisms for broader and more precise information access. MARAG-R1 equips the model with four retrieval tools -- semantic search, keyword search, filtering, and aggregation -- and learns both how and when to use them through a two-stage training process: supervised fine-tuning followed by reinforcement learning. This design allows the model to interleave reasoning and retrieval, progressively gathering sufficient evidence for corpus-level synthesis. Experiments on GlobalQA, HotpotQA, and 2WikiMultiHopQA demonstrate that MARAG-R1 substantially outperforms strong baselines and achieves new state-of-the-art results in corpus-level reasoning tasks.

翻译：大型语言模型（LLMs）在推理和生成方面表现出色，但受限于静态预训练数据，导致事实准确性不足且对新信息的适应能力较弱。检索增强生成（RAG）通过将LLMs与外部知识相结合来解决这一问题；然而，RAG的有效性关键取决于模型能否充分获取相关信息。现有的RAG系统依赖单一检索器并采用固定的top-k选择策略，仅能访问语料库中狭窄且静态的子集。因此，这种单一检索器范式已成为全面获取外部信息的主要瓶颈，尤其是在需要语料库级推理的任务中。为克服这一限制，我们提出了MARAG-R1——一种基于强化学习的多工具RAG框架，使LLMs能够动态协调多种检索机制，以实现更广泛且更精确的信息获取。MARAG-R1为模型配备了四种检索工具——语义搜索、关键词搜索、过滤和聚合——并通过两阶段训练过程（监督微调与强化学习）学习如何及何时使用这些工具。该设计使模型能够交替进行推理与检索，逐步收集足够的证据以完成语料库级综合。在GlobalQA、HotpotQA和2WikiMultiHopQA上的实验表明，MARAG-R1显著优于现有强基线模型，并在语料库级推理任务中取得了新的最先进性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日