Denoising diffusion models are a recent class of generative models exhibiting state-of-the-art performance in image and audio synthesis. Such models approximate the time-reversal of a forward noising process from a target distribution to a reference density, which is usually Gaussian. Despite their strong empirical results, the theoretical analysis of such models remains limited. In particular, all current approaches crucially assume that the target density admits a density w.r.t. the Lebesgue measure. This does not cover settings where the target distribution is supported on a lower-dimensional manifold or is given by some empirical distribution. In this paper, we bridge this gap by providing the first convergence results for diffusion models in this more general setting. In particular, we provide quantitative bounds on the Wasserstein distance of order one between the target data distribution and the generative distribution of the diffusion model.
翻译:去噪扩散模型是近期一类在图像和音频合成中展现出最先进性能的生成模型。这类模型通过近似从目标分布到参考密度(通常是高斯分布)的正向噪声化过程的时间反转来运作。尽管其实验结果强劲,但其理论分析仍然有限。具体而言,目前所有方法都关键性地假设目标分布相对于勒贝格测度存在密度函数。这并不涵盖目标分布支撑在低维流形上或由某些经验分布给出的情况。在本文中,我们通过提供扩散模型在此类更一般设定下的首类收敛结果来弥补这一空白。特别地,我们给出了目标数据分布与扩散模型生成分布之间的一阶Wasserstein距离的量化界。