The recent paradigm shift toward large reasoning models (LRMs) as autonomous agents has intensified the demand for sophisticated, multi-turn tool-use capabilities. Yet, existing datasets and data-generation approaches are limited by static, predefined toolsets that cannot scale to the complexity of open-ended human-agent collaboration. To address this, we initially developed a framework for automated task-oriented multi-turn dialogue generation at scale, utilizing an LRM-based simulator to dynamically generate high-value, domain-specific tools to solve specified tasks. However, we observe that a purely task-oriented design often results in "solely task-solving" trajectories, where the agent completes the objective with minimal interaction, failing to generate the high turn-count conversations seen in realistic scenarios. To bridge this gap, we shift toward a user-oriented simulation paradigm. By decoupling task generation from a dedicated user simulator that mimics human behavioral rules - such as incremental request-making and turn-by-turn feedback - we facilitate more authentic, extended multi-turn dialogues that reflect the iterative nature of real-world problem solving. Our generation pipeline operates as a versatile, plug-and-play module capable of initiating generation from any state, ensuring high scalability in producing extended tool-use data. Furthermore, by facilitating multiple task completions within a single trajectory, it yields a high-density dataset that reflects the multifaceted demands of real-world human-agent interaction.
翻译:近期向大型推理模型(LRM)作为自主代理的范式转变,加强了对复杂多轮工具使用能力的需求。然而,现有数据集和数据生成方法受限于静态、预定义的工具集,无法扩展到开放式人机协作的复杂性。为此,我们首先开发了一个面向任务的大规模自动化多轮对话生成框架,利用基于LRM的模拟器动态生成高价值、领域特定的工具来解决指定任务。然而,我们观察到纯粹面向任务的设计往往导致“仅任务解决”轨迹,即代理以最小交互完成目标,未能生成现实场景中常见的高轮次对话。为弥合这一差距,我们转向面向用户的模拟范式。通过将任务生成与模拟人类行为规则(如增量式请求和逐轮反馈)的专用用户模拟器解耦,我们促进了更真实、更扩展的多轮对话,反映了现实世界问题解决的迭代特性。我们的生成流水线作为一个多功能即插即用模块运行,能够从任何状态启动生成,确保在生成扩展工具使用数据时具有高可扩展性。此外,通过在单个轨迹内实现多个任务完成,它产生了一个高密度数据集,反映了现实世界人机交互的多方面需求。