Because diffusion models have shown impressive performances in a number of tasks, such as image synthesis, there is a trend in recent works to prove (with certain assumptions) that these models have strong approximation capabilities. In this paper, we show that current diffusion models actually have an expressive bottleneck in backward denoising and some assumption made by existing theoretical guarantees is too strong. Based on this finding, we prove that diffusion models have unbounded errors in both local and global denoising. In light of our theoretical studies, we introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising. SMD not only permits diffusion models to well approximate any Gaussian mixture distributions in theory, but also is simple and efficient for implementation. Our experiments on multiple image datasets show that SMD significantly improves different types of diffusion models (e.g., DDPM), espeically in the situation of few backward iterations.
翻译:由于扩散模型在图像合成等多项任务中展现出卓越性能,近期研究趋势致力于(在特定假设下)证明其具备强大的逼近能力。本文揭示当前扩散模型在反向去噪过程中实际存在表达能力瓶颈,且现有理论保证所依赖的部分假设过于严苛。基于这一发现,我们证明扩散模型在局部与全局去噪中均存在无界误差。通过理论分析,我们提出软混合去噪(SMD)——一种兼具表达力与高效性的反向去噪模型。SMD不仅从理论上使扩散模型能够完美逼近任意高斯混合分布,其实现方式亦简洁高效。在多个图像数据集上的实验表明,SMD显著提升了各类扩散模型(如DDPM)的性能,尤其是在反向迭代步数受限的场景下表现尤为突出。