Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the corpus and the agent reads a small set of returned documents. Recent direct corpus interaction (DCI) work shows that agents can instead interact with the raw corpus through shell tools such as grep and file reads. But unbounded interaction does not scale: every broad shell command is a scan over the whole corpus, and latency degrades sharply as the corpus grows. We argue that the role of retrieval for agentic search is not just to select documents that fit in the LLM context window, but to construct an interaction space: a bounded subset of the corpus the agent can explore with associated tools. Two design consequences follow. The space needs a boundary supplied by retrieval, and the objects within it should be processed for interaction. As a proof of concept, we propose RISE (Retrieving Interaction SpacE): we use BM25 to construct the interaction space; meanwhile, its documents are processed during indexing for shell-style navigation. On BrowseComp-Plus, RISE matches the pure-shell DCI baseline at 78% accuracy with gpt-5.4-mini at roughly one quarter of the per-query cost. At 1M documents, RISE-BM25 reaches 81% on gpt-5.4-mini, whereas DCI on gpt-5.4-nano degrades to 60% with 33 of 100 wall-clock failures.
翻译:搜索智能体的检索仍继承自非智能体信息检索模式:检索器对语料库进行排序,智能体则读取少量返回文档。近期直接语料交互研究显示,智能体能通过grep、文件读取等shell工具直接与原始语料交互。但无界交互无法规模化:每条广义shell指令都需要扫描整个语料库,时延随语料增长急剧恶化。我们主张智能体搜索中检索的作用不仅是选取适配大语言模型上下文窗口的文档,更要构建交互空间——一个可供智能体借助关联工具探索的有界语料子集。由此衍生两大设计准则:该空间需由检索提供边界,内部对象需为交互进行预处理。作为概念验证,我们提出RISE(交互空间检索框架):采用BM25构建交互空间,同时在其索引阶段对文档进行预处理以支持类Shell导航。在BrowseComp-Plus数据集上,采用gpt-5.4-mini的RISE在每次查询成本约四分之一的情况下,以78%的准确率匹敌纯Shell基线DCI。面对百万级文档规模,搭载gpt-5.4-mini的RISE-BM25达到81%准确率,而采用gpt-5.4-nano的DCI因33%/100的壁钟时失败率降至60%。