As agents operate over long horizons, their memory stores grow continuously, making retrieval critical to accessing relevant information. Many agent queries require reasoning-intensive retrieval, where the connection between query and relevant documents is implicit and requires inference to bridge. LLM-augmented pipelines address this through query expansion and candidate re-ranking, but introduce significant inference costs. We study computation allocation in reasoning-intensive retrieval pipelines using the BRIGHT benchmark and Gemini 2.5 model family. We vary model capacity, inference-time thinking, and re-ranking depth across query expansion and re-ranking stages. We find that re-ranking benefits substantially from stronger models (+7.5 NDCG@10) and deeper candidate pools (+21% from $k$=10 to 100), while query expansion shows diminishing returns beyond lightweight models (+1.1 NDCG@10 from weak to strong). Inference-time thinking provides minimal improvement at either stage. These results suggest that compute should be concentrated on re-ranking rather than distributed uniformly across pipeline stages.
翻译:随着智能体在长时程任务中的持续运行,其记忆存储不断增长,使得检索成为获取相关信息的关键环节。许多智能体查询需要进行推理密集型检索,即查询与相关文档之间的关联是隐式的,需要通过推理来建立联系。基于大语言模型的增强型处理流程通过查询扩展和候选重排序来解决这一问题,但同时也带来了显著的计算推理开销。本研究使用BRIGHT基准测试集和Gemini 2.5模型系列,对推理密集型检索流程中的计算资源分配进行了系统分析。我们在查询扩展和重排序两个阶段中,分别调整了模型容量、推理时思考机制以及重排序深度等变量。实验发现:重排序阶段从增强模型能力中获益显著(NDCG@10提升+7.5),且扩大候选池深度能带来明显增益(从$k$=10到100提升21%);而查询扩展阶段在超越轻量级模型后呈现收益递减趋势(从弱到强模型仅提升+1.1 NDCG@10)。推理时思考机制在两个阶段均未产生显著改进。这些结果表明,计算资源应当集中配置于重排序阶段,而非在流程各阶段均匀分配。