As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.
翻译:随着代码大语言模型(LLM)通过模型上下文协议(MCP)演化为工具交互式智能体,其泛化能力日益受到低质量合成数据和数量扩展收益递减的限制。此外,以数量为中心的扩展存在早期瓶颈,未能充分利用轨迹数据。我们提出TDScaling,一种基于轨迹多样性扩展的代码智能体数据合成框架,通过多样性而非原始数量来扩展性能。在固定训练预算下,增加轨迹多样性比添加更多轨迹能带来更大的性能提升,从而改善智能体训练的性能-成本权衡。TDScaling集成了四项创新:(1)捕捉真实服务逻辑依赖的**业务集群**机制;(2)确保轨迹连贯性的蓝图驱动多智能体范式;(3)利用**领域熵**、**推理模式熵**和**累积动作复杂度**引导合成朝向长尾场景的自适应进化机制,以防止模式崩溃;(4)沙盒化代码工具,以减轻对内在编码能力的灾难性遗忘。在通用工具使用基准(BFCL, tau^2-Bench)和代码智能体任务(RebenchT, CodeCI, BIRD)上的实验证明了双赢结果:TDScaling同时提升了工具使用的泛化能力和固有的编码熟练度。我们计划在论文发表时发布完整的代码库和合成数据集(包含30,000多个工具集群)。