The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work provides a scalable and effective pathway for advancing open-source LLMs without relying on proprietary data or models.
翻译:闭源与开源大型语言模型(LLM)之间的性能差距主要源于对高质量训练数据的获取差异。为弥合这一差距,我们提出了一种自动化合成复杂研究级指令数据的新框架。该方法以多智能体工作流为核心,通过协作的AI智能体模拟集成工具的复杂推理过程,端到端地生成多样化且高保真的数据。基于此合成数据,我们开发了一种两阶段训练策略,将监督微调与新颖的强化学习方法相结合,旨在最大化模型的对齐能力与性能。大量实验表明,该框架能够赋能不同规模的开源模型,使其在主流深度研究基准测试中取得新的最优性能。本工作为推进开源LLM的发展提供了一条可扩展且有效的路径,且无需依赖专有数据或模型。