Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial giants due to a lack of transparent, high-quality training data. This persistent data scarcity has fundamentally hindered the progress of the broader research community in developing and innovating within this domain. To bridge this gap, we introduce OpenSeeker, the first fully open-source search agent (i.e., model and data) that achieves frontier-level performance through two core technical innovations: (1) Fact-grounded scalable controllable QA synthesis, which reverse-engineers the web graph via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. (2) Denoised trajectory synthesis, which employs a retrospective summarization mechanism to denoise the trajectory, therefore promoting the teacher LLMs to generate high-quality actions. Experimental results demonstrate that OpenSeeker, trained (a single training run) on only 11.7k synthesized samples, achieves state-of-the-art performance across multiple benchmarks including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. Notably, trained with simple SFT, OpenSeeker significantly outperforms the second-best fully open-source agent DeepDive (e.g., 29.5% v.s. 15.3% on BrowseComp), and even surpasses industrial competitors such as Tongyi DeepResearch (trained via extensive continual pre-training, SFT, and RL) on BrowseComp-ZH (48.4% v.s. 46.7%). We fully open-source the complete training dataset and the model weights to democratize frontier search agent research and foster a more transparent, collaborative ecosystem.
翻译:深度搜索能力已成为前沿大语言模型智能体不可或缺的核心竞争力,然而,由于缺乏透明、高质量的训练数据,高性能搜索智能体的开发仍由工业巨头主导。这种持续存在的数据稀缺从根本上阻碍了更广泛的研究社区在该领域进行开发和创新的进程。为弥合这一差距,我们推出了OpenSeeker——首个完全开源的搜索智能体(即模型与数据),它通过两项核心技术创新实现了前沿水平的性能:(1)基于事实的可扩展可控问答合成,该方法通过拓扑扩展和实体混淆对网络图进行逆向工程,以生成覆盖范围和复杂度可控的复杂多跳推理任务。(2)去噪轨迹合成,该方法采用回顾性总结机制对轨迹进行去噪,从而促使教师大语言模型生成高质量的动作。实验结果表明,仅使用11.7k个合成样本进行训练(单次训练运行)的OpenSeeker,在包括BrowseComp、BrowseComp-ZH、xbench-DeepSearch和WideSearch在内的多个基准测试中均达到了最先进的性能。值得注意的是,通过简单的监督微调训练,OpenSeeker显著优于第二好的完全开源智能体DeepDive(例如在BrowseComp上为29.5%对15.3%),甚至在BrowseComp-ZH上超越了如通义DeepResearch(通过大量持续预训练、监督微调和强化学习训练)等工业竞争对手(48.4%对46.7%)。我们完全开源了完整的训练数据集和模型权重,以促进前沿搜索智能体研究的民主化,并培育一个更加透明、协作的生态系统。