Deep research agents have achieved remarkable progress on complex information seeking tasks. Even long ReAct style rollouts explore only a single trajectory, while recent state of the art systems scale inference time compute via parallel search and aggregation. Yet deep research answers are composed of complementary pieces of evidence, which parallel rollouts often duplicate rather than complete, yielding diminishing returns while pushing the aggregation context toward the model's limit. We propose Argus, an agentic system in which a Searcher and a Navigator cooperate to treat deep research as assembling a jigsaw from complementary evidence pieces, rather than brute forcing the whole answer in parallel. The Searcher collects evidence traces for a given sub-query through ReAct-style interaction. The Navigator maintains a shared evidence graph, verifying which pieces are still missing, dispatching Searchers to gather them, and reasoning over the completed graph to produce a source-traced final answer. We train the Navigator with reinforcement learning to verify, dispatch, and synthesize, while independently training the Searcher to remain a standard ReAct agent. The resulting Navigator supports rollouts with a single Searcher or many in parallel without retraining. With both Searcher and Navigator built on a 35B-A3B MoE backbone, Argus gains 5.5 points with a single Searcher and 12.7 points with 8 parallel Searchers, averaged over eight benchmarks. With 64 Searchers it reaches 86.2 on BrowseComp, surpassing every proprietary agent we benchmark, while the Navigator's reasoning context stays under 21.5K tokens.
翻译:深度研究代理在复杂信息检索任务上取得了显著进展。即便是长链式ReAct风格的推理轨迹也仅能探索单一路径,而当前最先进的系统通过并行搜索与聚合来扩展推理时的计算量。然而,深度研究答案由互补性证据片段构成,并行推理轨迹往往重复而非完善这些片段,导致边际效益递减,同时使聚合上下文逼近模型极限。我们提出Argus系统,其中搜索器与导航器协同工作,将深度研究视为从互补证据片段中拼图而非并行暴力求解完整答案的过程:搜索器通过ReAct风格交互收集指定子查询的证据轨迹;导航器维护共享证据图谱,验证缺失证据片段并调度搜索器进行采集,最终通过对完整图谱的推理生成带来源溯源的最终答案。我们采用强化学习训练导航器,使其具备验证、调度与综合能力,同时独立训练搜索器保持标准ReAct代理特性。训练后的导航器无需重新训练即可支持单搜索器或多搜索器并行推理。基于35B-A3B混合专家骨干网络,Argus在单搜索器配置下提升5.5个点,8个并行搜索器配置下提升12.7个点(八项基准测试均值)。当使用64个搜索器时,其在BrowseComp上达到86.2分,超越我们基准测试中所有专有代理,同时导航器推理上下文保持在21.5K tokens以内。