Selective retrieval aims to make retrieval-augmented generation (RAG) more efficient and reliable by skipping retrieval when an LLM's parametric knowledge suffices. Despite promising results, existing methods are constrained by a binary design choice: either retrieve from a single external source or skip retrieval and let the LLM directly produce the final answer. We argue that this fallback underestimates the model's knowledge and obscures the more general multi-source decision problem that arises in practical systems. We propose Self-Routing RAG (SR-RAG), which casts selective retrieval as knowledge source selection and treats the LLM itself as a first-class knowledge source. SR-RAG learns to select an appropriate knowledge source, optionally verbalize parametric knowledge, and answer using the selected source, all within a single left-to-right generation pass. SR-RAG further augments source selection by combining LLM-based uncertainty with a flexible external policy datastore to improve decision calibration. Across four benchmarks and three 7B-class LLMs, SR-RAG outperforms a strong selective retrieval baseline by 8.5%/2.1%/4.7% while performing 26%/40%/21% fewer retrievals, and it achieves favorable accuracy-latency trade-offs without dataset-specific threshold tuning.
翻译:选择性检索旨在通过在大语言模型参数知识充足时跳过检索,使检索增强生成(RAG)更加高效可靠。尽管现有方法取得了有希望的结果,但其受限于一种二元设计选择:要么从单一外部源检索,要么跳过检索并让大语言模型直接生成最终答案。我们认为这种回退机制低估了模型的知识,并掩盖了实际系统中出现的更普遍的多源决策问题。我们提出自路由检索增强生成(SR-RAG),它将选择性检索视为知识源选择问题,并将大语言模型本身视为一类首要的知识源。SR-RAG学习选择适当的知识源,可选地对参数知识进行言语化,并使用所选源回答问题,整个过程在单一自左向右的生成过程中完成。SR-RAG通过将基于大语言模型的不确定性与灵活的外部策略数据存储相结合,进一步增强了源选择能力,以改进决策校准。在四个基准测试和三个7B级别的大语言模型上,SR-RAG在减少26%/40%/21%检索次数的同时,优于强选择性检索基线8.5%/2.1%/4.7%,并且在不进行数据集特定阈值调优的情况下实现了有利的准确性与延迟权衡。