Diffusion models (DMs) and flow-matching models have demonstrated remarkable performance in image and video generation. However, such models require a significant number of function evaluations (NFEs) during sampling, leading to costly inference. Consequently, quality-preserving fast sampling methods that require fewer NFEs have been an active area of research. However, prior training-free sampling methods fail to simultaneously address two key challenges: the stiffness of the ODE (i.e., the non-straightness of the velocity field) and dependence on the semi-linear structure of the DM ODE (which limits their direct applicability to flow-matching models). In this work, we introduce the Stabilized Taylor Orthogonal Runge--Kutta (STORK) method, addressing both design concerns. We demonstrate that STORK consistently improves the quality of diffusion and flow-matching sampling for image and video generation. Code is available at https://github.com/ZT220501/STORK.
翻译:扩散模型与流匹配模型在图像和视频生成领域展现出卓越性能。然而,此类模型在采样过程中需要大量的函数评估次数,导致推理成本高昂。因此,能够在减少函数评估次数的同时保持生成质量的快速采样方法一直是研究热点。然而,现有的免训练采样方法未能同时解决两个关键挑战:常微分方程的刚性(即速度场的非直线性)以及对扩散模型常微分方程半线性结构的依赖(这限制了它们直接应用于流匹配模型)。本研究提出稳定泰勒正交龙格-库塔方法,以解决这两方面设计问题。我们证明,该方法能持续提升图像和视频生成任务中扩散与流匹配采样的质量。代码公开于 https://github.com/ZT220501/STORK。