Although diffusion models (DMs) have shown promising performances in a number of tasks (e.g., speech synthesis and image generation), they might suffer from error propagation because of their sequential structure. However, this is not certain because some sequential models, such as Conditional Random Field (CRF), are free from this problem. To address this issue, we develop a theoretical framework to mathematically formulate error propagation in the architecture of DMs, The framework contains three elements, including modular error, cumulative error, and propagation equation. The modular and cumulative errors are related by the equation, which interprets that DMs are indeed affected by error propagation. Our theoretical study also suggests that the cumulative error is closely related to the generation quality of DMs. Based on this finding, we apply the cumulative error as a regularization term to reduce error propagation. Because the term is computationally intractable, we derive its upper bound and design a bootstrap algorithm to efficiently estimate the bound for optimization. We have conducted extensive experiments on multiple image datasets, showing that our proposed regularization reduces error propagation, significantly improves vanilla DMs, and outperforms previous baselines.
翻译:尽管扩散模型在语音合成和图像生成等任务中展现出优异性能,但其序列化结构可能导致误差传播问题。然而这一结论并非绝对,因为条件随机场等序列模型并不存在该缺陷。为探究此问题,我们构建了理论框架来数学描述扩散模型架构中的误差传播机制。该框架包含三个核心要素:模块误差、累积误差与传播方程。模块误差与累积误差通过方程关联,证明了扩散模型确实受误差传播影响。理论研究表明累积误差与扩散模型的生成质量密切相关。基于该发现,我们将累积误差作为正则化项以抑制误差传播。由于该正则项难以直接计算,我们推导出其上界,并设计自举算法高效估算该上界以用于优化。在多个图像数据集上的大量实验表明,所提正则化方法能有效抑制误差传播、显著提升标准扩散模型性能,且优于既往基线方法。