Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily on the task distribution and the available operators. Under domain shift, current systems typically rely on iterative workflow refinement to discover a feasible workflow from a large workflow space, incurring high iteration costs and yielding unstable, domain-specific behavior. In response, we internalize a decompose-recompose-decide mechanism into an open-source LLM for cross-domain workflow generation. To decompose, we learn a compact set of reusable workflow capabilities across diverse domains. To recompose, we map each input task to a sparse composition over these bases to generate a task-specific workflow in a single pass. To decide, we attribute the success or failure of workflow generation to counterfactual contributions from learned capabilities, thereby capturing which capabilities actually drive success by their marginal effects. Across stringent multi-domain, cross-domain, and unseen-domain evaluations, our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations, while substantially reducing generation latency and cost.
翻译:自动生成智能体工作流——即可执行的操作图或代码,用于协调推理、验证与修复——已成为解决复杂任务的有效方法,其能力已超越单次大语言模型生成所能可靠处理的范围。然而,优质工作流的构成高度依赖于任务分布与可用操作符。在领域偏移情境下,现有系统通常依赖迭代式工作流优化,从庞大的工作流空间中探索可行方案,导致高昂的迭代成本并产生不稳定、领域特定的行为。为此,我们将"分解-重组-决策"机制内化至开源大语言模型中,以实现跨领域工作流生成。在分解阶段,我们通过跨领域学习获得一组紧凑且可复用的工作流能力基元。在重组阶段,我们将每个输入任务映射到这些基元上的稀疏组合,从而单次生成任务专属的工作流。在决策阶段,我们通过反事实归因分析工作流生成的成功或失败与习得能力基元的贡献关系,从而通过边际效应捕捉真正驱动成功的能力要素。在严格的多领域、跨领域及未见领域评估中,我们提出的单次生成器在仅需一次生成的情况下,其性能超越了需要20次迭代的当前最优优化基线,同时显著降低了生成延迟与成本。