Deep research agents have achieved remarkable progress on complex information seeking tasks. Even long ReAct style rollouts explore only a single trajectory, while recent state of the art systems scale inference time compute via parallel search and aggregation. Yet deep research answers are composed of complementary pieces of evidence, which parallel rollouts often duplicate rather than complete, yielding diminishing returns while pushing the aggregation context toward the model's limit. We propose Argus, an agentic system in which a Searcher and a Navigator cooperate to treat deep research as assembling a jigsaw from complementary evidence pieces, rather than brute forcing the whole answer in parallel. The Searcher collects evidence traces for a given sub-query through ReAct-style interaction. The Navigator maintains a shared evidence graph, verifying which pieces are still missing, dispatching Searchers to gather them, and reasoning over the completed graph to produce a source-traced final answer. We train the Navigator with reinforcement learning to verify, dispatch, and synthesize, while independently training the Searcher to remain a standard ReAct agent. The resulting Navigator supports rollouts with a single Searcher or many in parallel without retraining. With both Searcher and Navigator built on a 35B-A3B MoE backbone, Argus gains 5.5 points with a single Searcher and 12.7 points with 8 parallel Searchers, averaged over eight benchmarks. With 64 Searchers it reaches 86.2 on BrowseComp, surpassing every proprietary agent we benchmark, while the Navigator's reasoning context stays under 21.5K tokens.
翻译:深度研究智能体在复杂信息检索任务上取得了显著进展。然而,即使采用长程ReAct风格的交互轨迹,也仅能探索单一决策路径;虽然现有先进系统通过并行搜索与聚合扩展推理时计算,但深度研究答案往往需要由互补证据片段构成,而并行轨迹常出现重复而非整合,导致边际收益递减且聚合上下文逼近模型极限。为此,我们提出Argus系统,通过搜索器与导航器的协同,将深度研究视为从互补证据片段拼合拼图的过程,而非简单的并行暴力求解。搜索器通过ReAct式交互为给定子查询收集证据链;导航器维护共享证据图,验证缺失证据片段并调度多个搜索器进行补充采集,最终基于完整证据图推理生成带有来源溯源的最终答案。我们采用强化学习训练导航器,使其具备验证、调度与综合能力,同时保持搜索器作为标准ReAct智能体的独立训练。由此训练的导航器可支持单搜索器或多搜索器并行推理场景且无需重新训练。基于35B-A3B混合专家骨干网络的搜索器与导航器,在8个基准测试中,单搜索器模式下提升5.5个百分点,8个并行搜索器模式下提升12.7个百分点。当启用64个并行搜索器时,该系统在BrowseComp数据集上达到86.2分,超越所有参与基准测试的闭源智能体系统,且导航器推理上下文长度始终低于2.15万词元。