Beyond Quantity: Trajectory Diversity Scaling for Code Agents

Guhong Chen,Chenghao Sun,Cheng Fu,Qiyao Wang,Zhihong Huang,Chaopeng Wei,Guangxu Chen,Feiteng Fang,Ahmadreza Argha,Bing Zhao,Xander Xu,Qi Han,Hamid Alinejad-Rokny,Qiang Qu,Binhua Li,Shiwen Ni,Min Yang,Hu Wei,Yongbin Li

As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.

翻译：随着代码大语言模型（LLM）通过模型上下文协议（MCP）演化为工具交互式智能体，其泛化能力日益受到低质量合成数据和数量扩展收益递减的限制。此外，以数量为中心的扩展存在早期瓶颈，未能充分利用轨迹数据。我们提出TDScaling，一种基于轨迹多样性扩展的代码智能体数据合成框架，通过多样性而非原始数量来扩展性能。在固定训练预算下，增加轨迹多样性比添加更多轨迹能带来更大的性能提升，从而改善智能体训练的性能-成本权衡。TDScaling集成了四项创新：（1）捕捉真实服务逻辑依赖的**业务集群**机制；（2）确保轨迹连贯性的蓝图驱动多智能体范式；（3）利用**领域熵**、**推理模式熵**和**累积动作复杂度**引导合成朝向长尾场景的自适应进化机制，以防止模式崩溃；（4）沙盒化代码工具，以减轻对内在编码能力的灾难性遗忘。在通用工具使用基准（BFCL, tau^2-Bench）和代码智能体任务（RebenchT, CodeCI, BIRD）上的实验证明了双赢结果：TDScaling同时提升了工具使用的泛化能力和固有的编码熟练度。我们计划在论文发表时发布完整的代码库和合成数据集（包含30,000多个工具集群）。