Compute Allocation for Reasoning-Intensive Retrieval Agents

As agents operate over long horizons, their memory stores grow continuously, making retrieval critical to accessing relevant information. Many agent queries require reasoning-intensive retrieval, where the connection between query and relevant documents is implicit and requires inference to bridge. LLM-augmented pipelines address this through query expansion and candidate re-ranking, but introduce significant inference costs. We study computation allocation in reasoning-intensive retrieval pipelines using the BRIGHT benchmark and Gemini 2.5 model family. We vary model capacity, inference-time thinking, and re-ranking depth across query expansion and re-ranking stages. We find that re-ranking benefits substantially from stronger models (+7.5 NDCG@10) and deeper candidate pools (+21% from $k$=10 to 100), while query expansion shows diminishing returns beyond lightweight models (+1.1 NDCG@10 from weak to strong). Inference-time thinking provides minimal improvement at either stage. These results suggest that compute should be concentrated on re-ranking rather than distributed uniformly across pipeline stages.

翻译：随着智能体在长时程任务中的持续运行，其记忆存储不断增长，使得检索成为获取相关信息的关键环节。许多智能体查询需要进行推理密集型检索，即查询与相关文档之间的关联是隐式的，需要通过推理来建立联系。基于大语言模型的增强型处理流程通过查询扩展和候选重排序来解决这一问题，但同时也带来了显著的计算推理开销。本研究使用BRIGHT基准测试集和Gemini 2.5模型系列，对推理密集型检索流程中的计算资源分配进行了系统分析。我们在查询扩展和重排序两个阶段中，分别调整了模型容量、推理时思考机制以及重排序深度等变量。实验发现：重排序阶段从增强模型能力中获益显著（NDCG@10提升+7.5），且扩大候选池深度能带来明显增益（从$k$=10到100提升21%）；而查询扩展阶段在超越轻量级模型后呈现收益递减趋势（从弱到强模型仅提升+1.1 NDCG@10）。推理时思考机制在两个阶段均未产生显著改进。这些结果表明，计算资源应当集中配置于重排序阶段，而非在流程各阶段均匀分配。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

OpenEarthAgent：一种面向工具增强型地理空间智能体的统一框架

专知会员服务

16+阅读 · 2月20日

大语言模型的智能体化推理

专知会员服务

35+阅读 · 1月21日

智能体工程（Agent Engineering）

专知会员服务

33+阅读 · 2025年12月31日

基于大语言模型（LLM）的智能体推理框架：从方法到场景的综述

专知会员服务

53+阅读 · 2025年8月26日