Diffusion models have achieved remarkable success in image and video generation. However, their inherently multiple step inference process imposes substantial computational overhead, hindering real-world deployment. Accelerating diffusion models is therefore essential, yet determining how to combine multiple model acceleration techniques remains a significant challenge. To address this issue, we introduce a framework driven by large language models (LLMs) for automated acceleration code generation and evaluation. First, we present DiffBench, a comprehensive benchmark that implements a three stage automated evaluation pipeline across diverse diffusion architectures, optimization combinations and deployment scenarios. Second, we propose DiffAgent, an agent that generates optimal acceleration strategies and codes for arbitrary diffusion models. DiffAgent employs a closed-loop workflow in which a planning component and a debugging component iteratively refine the output of a code generation component, while a genetic algorithm extracts performance feedback from the execution environment to guide subsequent code refinements. We provide a detailed explanation of the DiffBench construction and the design principles underlying DiffAgent. Extensive experiments show that DiffBench offers a thorough evaluation of generated codes and that DiffAgent significantly outperforms existing LLMs in producing effective diffusion acceleration strategies.
翻译:扩散模型在图像与视频生成领域取得了显著成功。然而,其固有的多步推理过程带来了巨大的计算开销,阻碍了实际部署。因此,加速扩散模型至关重要,但如何组合多种模型加速技术仍然是一个重大挑战。为解决此问题,我们引入了一个由大语言模型驱动的框架,用于自动化加速代码生成与评估。首先,我们提出了 DiffBench,这是一个全面的基准测试,实现了涵盖多种扩散模型架构、优化组合及部署场景的三阶段自动化评估流程。其次,我们提出了 DiffAgent,一个能为任意扩散模型生成最优加速策略与代码的智能体。DiffAgent 采用闭环工作流,其中规划组件与调试组件迭代优化代码生成组件的输出,同时遗传算法从执行环境中提取性能反馈以指导后续的代码改进。我们详细阐述了 DiffBench 的构建过程以及 DiffAgent 背后的设计原理。大量实验表明,DiffBench 能对生成的代码进行全面评估,并且 DiffAgent 在生成有效的扩散模型加速策略方面显著优于现有的大语言模型。