Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.
翻译:深度搜索能力已成为前沿大语言模型智能体不可或缺的核心技能,但其发展仍由工业巨头主导。典型工业方案依赖高度资源密集型的流水线,涵盖预训练、持续预训练、监督微调与强化学习四个阶段。本报告表明,当配备信息丰富且高难度的轨迹时,简单的监督微调方法即可在训练前沿搜索智能体方面展现出惊人效果。通过引入三项简单的数据合成改进方法——扩展知识图谱规模以增强探索空间、扩充工具集规模以拓展功能范围、以及严格筛选低步数样本——我们建立了更强的基线方案。仅基于10.6k数据点训练的OpenSeeker-v2在四项基准测试中(采用ReAct范式的30B规模智能体)取得了最优性能:BrowseComp达46.0%,BrowseComp-ZH达58.1%,Humanity's Last Exam达34.6%,xbench达78.0%,甚至超越了采用繁重持续预训练+监督微调+强化学习流水线训练的Tongyi DeepResearch(对应分数分别为43.4%、46.7%、32.9%和75.0%)。值得注意的是,OpenSeeker-v2是同类模型规模与范式下首个由纯学术团队仅通过监督微调开发的前沿搜索智能体。我们热忱开源OpenSeeker-v2模型权重,并分享这一简单而有效的发现,以推动前沿搜索智能体研究在社区中的普及。