Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.
翻译:大型语言模型正从通用知识引擎转变为现实世界问题求解器,然而针对深度搜索任务对其进行优化仍具挑战。核心瓶颈在于高质量搜索轨迹与奖励信号的极端稀疏性,这源于可扩展长视野任务构建的困难以及涉及外部工具调用的交互密集型推演的高昂成本。为解决这些挑战,我们提出REDSearcher,一个统一框架,通过协同设计复杂任务合成、中期训练与后期训练来实现可扩展的搜索智能体优化。具体而言,REDSearcher引入以下改进:(1) 我们将任务合成构建为双约束优化问题,其中任务难度通过图拓扑与证据分散度精确调控,从而实现复杂高质量任务的可扩展生成。(2) 我们引入工具增强查询,以鼓励主动的工具使用而非被动回忆。(3) 在中期训练阶段,我们显著强化核心原子能力——知识、规划与函数调用——大幅降低为下游训练收集高质量轨迹的成本。(4) 我们构建了一个局部模拟环境,支持强化学习实验的快速、低成本算法迭代。在纯文本与多模态搜索智能体基准测试中,我们的方法均取得了最先进的性能。为促进长视野搜索智能体的未来研究,我们将发布10K条高质量复杂文本搜索轨迹、5K条多模态轨迹、1K条文本强化学习查询集,以及相关代码与模型检查点。