Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that the proposed input perturbation leads to a significant improvement of the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP
翻译:去噪扩散概率模型展示了令人印象深刻的生成质量,但其长采样链导致计算成本高昂。本文观察到,长采样链还会导致误差累积现象,这与自回归文本生成中的曝光偏差问题类似。具体而言,我们注意到训练和测试之间存在差异,因为前者以真实样本为条件,而后者则以前一步生成的结果为条件。为了缓解这一问题,我们提出了一种非常简单但有效的训练正则化方法,即扰动真实样本以模拟推理时的预测误差。实验表明,所提出的输入扰动显著提升了样本质量,同时缩短了训练和推理时间。例如,在CelebA 64×64数据集上,我们实现了1.27的最新FID分数,同时节省了37.5%的训练时间。代码已公开,见https://github.com/forever208/DDPM-IP