Agentic task-solving with Large Language Models (LLMs) requires multi-turn, multi-step interactions, often involving complex function calls and dynamic user-agent exchanges. Existing simulation-based data generation methods for such scenarios rely heavily on costly autoregressive interactions between multiple LLM agents, thereby compromising the practical efficiency of agentic data generation. In this paper, we propose ToolACE-MT, a novel Non-Autoregressive Iterative Generation framework for constructing high-quality multi-turn agentic dialogues. ToolACE-MT generates full conversational trajectories through three stages: coarse-grained initialization, iterative refinement, and offline verification. The initialization phase builds a structurally complete yet semantically coarse dialogue skeleton; the iterative refinement phase introduces realistic complexities and continued refinement via mask-and-fill operations; and the offline verification phase ensures correctness and coherence via rule- and model-based checks. Experiments demonstrate that ToolACE-MT enables efficient, effective and generalizable agentic data generation, offering a new paradigm for high-quality data construction in tool-augmented LLM scenarios.
翻译:基于大型语言模型(LLM)的智能体任务求解通常需要多轮、多步骤的交互,常涉及复杂的函数调用与动态的用户-智能体信息交换。现有面向此类场景的基于仿真的数据生成方法,严重依赖于多个LLM智能体之间成本高昂的自回归式交互,从而影响了智能体数据生成的实际效率。本文提出ToolACE-MT,一种新颖的非自回归迭代生成框架,用于构建高质量的多轮智能体对话。ToolACE-MT通过三个阶段生成完整的对话轨迹:粗粒度初始化、迭代精化与离线验证。初始化阶段构建一个结构完整但语义粗糙的对话骨架;迭代精化阶段通过掩码填充操作引入现实的复杂性并进行持续优化;离线验证阶段则通过基于规则和基于模型的检查来确保正确性与连贯性。实验表明,ToolACE-MT能够实现高效、有效且可泛化的智能体数据生成,为工具增强型LLM场景下的高质量数据构建提供了新范式。