Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the \textbf{exposure bias} problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that the proposed input perturbation leads to a significant improvement of the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time.
翻译:去噪扩散概率模型展现出令人印象深刻的生成质量,但其长采样链导致计算成本较高。本文中,我们观察到长采样链还会导致误差累积现象,这与自回归文本生成中的\textbf{曝光偏差}问题类似。具体而言,我们注意到训练和测试之间存在差异:前者以真实样本为条件,而后者则依赖于之前生成的结果。为缓解此问题,我们提出一种极其简单但有效的训练正则化方法:对真实样本添加扰动,以模拟推理阶段的预测误差。实验结果表明,所提出的输入扰动在显著提升样本质量的同时,还能减少训练和推理时间。例如,在CelebA 64×64数据集上,我们以训练时间节省37.5%为代价,实现了1.27的新最优FID分数。