Large Language Models (LLMs), despite their remarkable capabilities, are prone to generating hallucinated or outdated content due to their static internal knowledge. While Retrieval-Augmented Generation (RAG) integrated with Reinforcement Learning (RL) offers a solution, these methods are fundamentally constrained by a single-query mode, leading to prohibitive latency and inherent brittleness. To overcome these limitations, we introduce RAG-R1, a novel two-stage training framework centered around multi-query parallelism. Our framework enables LLMs to adaptively leverage internal and external knowledge during the reasoning process while transitioning from the single-query mode to multi-query parallelism. This architectural shift bolsters reasoning robustness while significantly reducing inference latency. Extensive experiments on seven question-answering benchmarks confirm the superiority of our method, which outperforms the strongest baseline by up to 13.7% and decreases inference time by 11.1%.
翻译:尽管大语言模型(LLMs)具备卓越的能力,但其静态的内部知识库容易导致生成内容存在幻觉或过时。虽然结合强化学习(RL)的检索增强生成(RAG)提供了解决方案,但这些方法本质上受限于单查询模式,导致难以接受的延迟和固有的脆弱性。为克服这些限制,我们提出了RAG-R1——一个以多查询并行为核心的新型两阶段训练框架。该框架使LLMs能够在推理过程中自适应地利用内部与外部知识,同时实现从单查询模式向多查询并行的转变。这种架构性转变增强了推理的鲁棒性,并显著降低了推理延迟。在七个问答基准测试上的大量实验证实了本方法的优越性,其性能超越最强基线达13.7%,同时将推理时间降低11.1%。