Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models. However, the rationality of the diffusion process itself receives limited attention, leaving the question of whether the problem is well-posed and well-conditioned. In this paper, we uncover a vexing propensity of diffusion models: they frequently exhibit the infinite Lipschitz near the zero point of timesteps. This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations. We provide a comprehensive evaluation of the issue from both theoretical and empirical perspectives. To address this challenge, we propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz singularity of the diffusion model near zero. Remarkably, our technique yields a substantial improvement in performance, e.g., on the high-resolution FFHQ dataset ($256\times256$). Moreover, as a byproduct of our method, we manage to achieve a dramatic reduction in the Frechet Inception Distance of other acceleration methods relying on network Lipschitz, including DDIM and DPM-Solver, by over 33$\%$. We conduct extensive experiments on diverse datasets to validate our theory and method. Our work not only advances the understanding of the general diffusion process, but also provides insights for the design of diffusion models.
翻译:扩散模型通过随机微分方程利用积分操作生成图像,已成为一类主流生成模型。然而,扩散过程本身的合理性鲜受关注,其问题适定性与条件是否良好的疑问尚未得到解答。本文揭示了扩散模型的一个棘手特性:模型在时间步零点附近频繁呈现无限Lipschitz性质。这威胁到依赖积分运算的扩散过程的稳定性与准确性。我们从理论和实证双重视角对该问题进行了全面评估。为解决这一挑战,我们提出了一种名为E-TSDM的新方法,该方法能够消除扩散模型在零点附近的Lipschitz奇异性。值得注意的是,该技术显著提升了模型性能,例如在高分辨率FFHQ数据集(256×256)上。此外,作为方法的附带成果,我们成功将依赖网络Lipschitz的加速方法(包括DDIM和DPM-Solver)的Fréchet Inception Distance大幅降低超过33%。我们在多种数据集上开展了广泛实验以验证理论与方法。本研究不仅深化了对通用扩散过程的理解,也为扩散模型的设计提供了重要见解。