Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfamiliar database rather than how an expert navigates it with strong priors about terminology and likely evidence, and results in unnecessary retrieval rounds, increased latency, and poor recall. We introduce \textit{SuperIntelligent Retrieval Agent} (SIRA), which defines \emph{superintelligence} in retrieval as the ability to compress multi-round exploratory search into a single corpus-discriminative retrieval action. SIRA does not merely ask what terms are relevant to the query; it asks which terms are likely to separate the desired evidence from corpus-level confusers. On the corpus side, an LLM enriches each document offline with missing search vocabulary; on the query side, it predicts evidence vocabulary omitted by the query; and document-frequency statistics as a tool call to filter proposed terms that are absent, overly common, or unlikely to create retrieval margin. The final retrieval step is a single weighted BM25 call combining the original query with the validated expansion. Across ten BEIR benchmarks and downstream question-answering tasks, SIRA achieves the significantly superior performance outperforming dense retrievers and state-of-the-art multi-round agentic baselines, demonstrating that one well-formed lexical query, guided by LLM cognition and lightweight corpus statistics, can exceed substantially more expensive multi-round search while remaining interpretable, training-free, and efficient.

翻译：检索增强型代理正逐渐成为大型组织知识库的接口，然而多数代理仍将检索过程视为黑箱：它们发出探索性查询、检查返回的片段，并反复重新表述直至找到有用的证据。这种方法类似于新手在陌生数据库中搜索，而非专家凭借术语和可能证据的强先验知识进行导航，从而导致不必要的检索轮次、延迟增加和召回率低下。我们提出了超级智能检索代理（SIRA），它将检索中的“超级智能”定义为将多轮探索性搜索压缩为单个具有语料区分性的检索动作的能力。SIRA不仅询问哪些术语与查询相关，还询问哪些术语可能将目标证据与语料级别的混淆项区分开来。在语料方面，大型语言模型离线为每个文档补充缺失的搜索词汇；在查询方面，它预测查询省略的证据词汇；并利用文档频率统计作为工具调用，过滤掉那些缺失、过于普遍或不太可能产生检索边际的建议术语。最终检索步骤是通过单一加权的BM25调用，将原始查询与经过验证的扩展项结合。在十个BEIR基准测试及下游问答任务中，SIRA取得了显著优越的性能，超越了密集检索器和最先进的多轮代理基线，证明一个由大语言模型认知和轻量级语料统计引导的、精心构造的词法查询，能够超越成本高得多的多轮搜索，同时保持可解释性、无需训练并高效运行。