Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage these capabilities. They still rely on two paradigms: (1) designing an algorithm that retrieves passages in a single shot and concatenates them into the model's input, or (2) predefining a workflow and prompting the model to execute it step-by-step. Neither paradigm allows the model to participate in retrieval decisions, preventing efficient scaling with model improvements. In this paper, we introduce A-RAG, an Agentic RAG framework that exposes hierarchical retrieval interfaces directly to the model. A-RAG provides three retrieval tools: keyword search, semantic search, and chunk read, enabling the agent to adaptively search and retrieve information across multiple granularities. Experiments on multiple open-domain QA benchmarks show that A-RAG consistently outperforms existing approaches with comparable or lower retrieved tokens, demonstrating that A-RAG effectively leverages model capabilities and dynamically adapts to different RAG tasks. We further systematically study how A-RAG scales with model size and test-time compute. We will release our code and evaluation suite to facilitate future research. Code and evaluation suite are available at https://github.com/Ayanami0730/arag.
翻译:前沿语言模型已展现出强大的推理能力和长程工具使用能力。然而,现有的检索增强生成(RAG)系统未能充分利用这些能力。它们仍依赖于两种范式:(1)设计单次检索段落并将其拼接至模型输入的算法,或(2)预定义工作流程并提示模型逐步执行。这两种范式均未让模型参与检索决策,阻碍了其随模型改进而高效扩展。本文提出A-RAG,一种智能RAG框架,将分层检索接口直接暴露给模型。A-RAG提供三种检索工具:关键词搜索、语义搜索和分块读取,使智能体能够跨多粒度自适应地搜索与检索信息。在多个开放域问答基准上的实验表明,A-RAG在使用相当或更少检索标记量的情况下持续优于现有方法,证明A-RAG能有效利用模型能力并动态适应不同RAG任务。我们进一步系统研究了A-RAG如何随模型规模与测试时计算量扩展。我们将公开代码与评估套件以促进未来研究。代码与评估套件发布于 https://github.com/Ayanami0730/arag。