ComfyUI provides a widely-adopted, workflow-based interface that enables users to customize various image generation tasks through an intuitive node-based architecture. However, the intricate connections between nodes and diverse modules often present a steep learning curve for users. In this paper, we introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI workflows based on task descriptions automatically. ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent. The core innovation of ComfyGPT lies in two key aspects. First, it focuses on generating individual node links rather than entire workflows, significantly improving generation precision. Second, we proposed FlowAgent, a LLM-based workflow generation agent that uses both supervised fine-tuning (SFT) and reinforcement learning (RL) to improve workflow generation accuracy. Moreover, we introduce FlowDataset, a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench, a comprehensive benchmark for evaluating workflow generation systems. We also propose four novel evaluation metrics: Format Validation (FV), Pass Accuracy (PA), Pass Instruct Alignment (PIA), and Pass Node Diversity (PND). Experimental results demonstrate that ComfyGPT significantly outperforms existing LLM-based methods in workflow generation.
翻译:ComfyUI提供了一个被广泛采用的、基于工作流的界面,通过直观的节点架构使用户能够定制各种图像生成任务。然而,节点之间错综复杂的连接以及多样化的模块通常给用户带来了陡峭的学习曲线。本文介绍了ComfyGPT,这是首个为基于任务描述自动生成ComfyUI工作流而设计的自优化多智能体系统。ComfyGPT包含四个专用智能体:ReformatAgent、FlowAgent、RefineAgent和ExecuteAgent。ComfyGPT的核心创新在于两个关键方面。首先,它专注于生成单个节点链接而非整个工作流,从而显著提高了生成精度。其次,我们提出了FlowAgent,这是一个基于大语言模型的工作流生成智能体,它同时使用监督微调和强化学习来提高工作流生成的准确性。此外,我们引入了FlowDataset,这是一个包含13,571个工作流-描述对的大规模数据集,以及FlowBench,一个用于评估工作流生成系统的综合基准。我们还提出了四个新颖的评估指标:格式验证、通过准确性、通过指令对齐度和通过节点多样性。实验结果表明,ComfyGPT在工作流生成方面显著优于现有基于大语言模型的方法。