S1-DeepResearch: Beyond Search, Toward Real-World Long-Horizon Research Agents

Deep research agents aim to solve complex knowledge-intensive tasks through long-horizon planning, evidence gathering, reasoning, and report generation. While recent progress in search agents has demonstrated strong capabilities in information retrieval and answer verification, most existing training datasets remain search-centric, focusing primarily on closed-ended question answering and information localization. As a result, they mainly train information-seeking behavior while providing limited coverage of key deep research capabilities, including evidence integration, knowledge synthesis, planning, file understanding, and structured report generation. In this work, we propose a unified trajectory construction paradigm for deep research agents that combines closed-ended QA and open-ended exploration. The proposed framework consists of graph-grounded task formulation, agentic trajectory rollout, and multi-dimensional trajectory verification, enabling scalable synthesis of high-quality agentic trajectories spanning long-chain complex reasoning, deep research instruction following, report writing, file understanding and generation, and skills usage. Compared with existing search-oriented datasets, our synthesized trajectories place greater emphasis on knowledge synthesis, complex reasoning, and planning. S1-DeepResearch-32B achieves state-of-the-art performance among open-source models of comparable scale across 20 benchmarks spanning five capability dimensions, including complex reasoning, instruction following, report generation, file understanding, and skills usage. On several challenging deep research benchmarks, it approaches the performance of leading proprietary frontier models. These results highlight the importance of jointly modeling information acquisition, knowledge synthesis, and planning-oriented agent behaviors for building effective deep research agents.

翻译：深度研究智能体旨在通过长周期规划、证据收集、推理和报告生成来解决复杂知识密集型任务。尽管近年来搜索智能体在信息检索和答案验证方面展现出强大的能力，但现有的大多数训练数据集仍以搜索为中心，主要聚焦于封闭式问答和信息定位。因此，它们主要训练信息寻求行为，而对关键深度研究能力的覆盖有限，包括证据整合、知识综合、规划、文件理解和结构化报告生成。在这项工作中，我们提出了一种面向深度研究智能体的统一轨迹构建范式，该范式结合了封闭式问答与开放式探索。所提出的框架包括基于图的任务构建、智能体轨迹展开和多维轨迹验证，从而能够可扩展地合成覆盖长链复杂推理、深度研究指令遵循、报告撰写、文件理解与生成以及技能使用的高质量智能体轨迹。与现有面向搜索的数据集相比，我们合成的轨迹更强调知识综合、复杂推理和规划。S1-DeepResearch-32B 在涵盖复杂推理、指令遵循、报告生成、文件理解和技能使用五个能力维度的 20 个基准测试中，达到了同等规模开源模型的最优性能。在多个具有挑战性的深度研究基准测试上，其性能接近领先的专有前沿模型。这些结果凸显了联合建模信息获取、知识综合和面向规划的智能体行为对于构建有效的深度研究智能体的重要性。