Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular long-context benchmarks BabiLong and RULER for contexts up to 10M tokens. Code is available at https://github.com/griver/Q-RAG
翻译:检索增强生成(RAG)方法通过高效过滤相关上下文来提升大语言模型(LLM)的性能,减少幻觉现象和推理成本。然而,现有大多数RAG方法仅关注单步检索,往往难以应对需要多步搜索的复杂问题。近年来,多步检索方法逐渐兴起,通常涉及对小规模LLM进行微调以执行多步检索。这类微调方法资源消耗极高,且无法支持更大规模LLM的应用。在本研究中,我们提出Q-RAG——一种基于强化学习(RL)微调嵌入器模型以实现多步检索的创新方法。Q-RAG为开放域问答任务提供了兼具竞争力与资源效率的替代方案,并在主流长上下文基准测试集BabiLong和RULER上(上下文长度最高达1000万token)取得了最先进的结果。代码已开源至https://github.com/griver/Q-RAG。