LLMs can solve complex tasks through reasoning and tool use, but accurately translating these solutions into structured workflows remains challenging. We model workflows as sequences of tool use and reformulate the problem as designing a mechanism that can both solve tasks and reliably construct workflows. Prior approaches that build workflows during execution often suffer from inaccuracies due to interference between the two processes. We propose an Execute-Summarize(ES) framework that decouples task execution from workflow construction: the model first completes the task using available tools, then independently reconstructs a structured workflow from execution traces. This separation improves workflow accuracy and robustness. We introduce FlowBench and show through extensive experiments that our approach outperforms existing methods, providing a reliable paradigm for grounding free-form LLM reasoning into structured workflows.
翻译:大型语言模型(LLM)能够通过推理和工具使用解决复杂任务,但将这些解决方案准确转化为结构化工作流仍具挑战性。我们将工作流建模为工具使用序列,并将该问题重新定义为设计一种既能解决任务又能可靠构建工作流的机制。现有在执行过程中构建工作流的方法常因两个过程间的相互干扰而产生误差。我们提出执行-摘要(ES)框架,将任务执行与工作流构建解耦:模型首先利用可用工具完成任务,随后独立从执行轨迹中重构结构化工作流。这种分离提升了工作流的准确性与鲁棒性。我们引入FlowBench基准测试,并通过大量实验表明,该方法优于现有技术,为将自由形式的LLM推理落地为结构化工作流提供了可靠范式。