Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to reinterpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection-error of the denoiser. Finally, we propose a new sampler based on two simple modifications to DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models.
翻译:去噪直观上与投影相关。事实上,在流形假设下,添加随机噪声近似等价于正交扰动。因此,学习去噪近似等价于学习投影。本文利用这一观察结果,将去噪扩散模型重新解释为应用于欧几里得距离函数的近似梯度下降。随后,我们在去噪器投影误差的简单假设下,提供了DDIM采样器的直接收敛性分析。最后,基于理论结果的启示,我们提出了一种对DDIM进行两项简单修改的新采样器。仅需5-10次函数评估,该采样器在预训练的CIFAR-10和CelebA模型上即达到了最先进的FID分数,并能在潜在扩散模型中生成高质量样本。