Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DPM. In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce a sampling method named Time-Shift Sampler. Our framework can be seamlessly integrated to existing sampling algorithms, such as DDPM, DDIM and other high-order solvers, inducing merely minimal additional computations. Experimental results show our method brings significant and consistent improvements in FID scores on different datasets and sampling methods. For example, integrating Time-Shift Sampler to F-PNDM yields a FID=3.88, achieving 44.49\% improvements as compared to F-PNDM, on CIFAR-10 with 10 sampling steps, which is more performant than the vanilla DDIM with 100 sampling steps. We will release the code upon acceptance.
翻译:扩散概率模型(DPM)在高质量图像合成中展现出显著效能。然而,其推理过程通常需要大量迭代步骤(可能达数百步),导致训练与推理之间的差异加剧了暴露偏差问题。现有工作试图通过在训练过程中扰动输入来缓解此问题,但这需要重新训练DPM。本文对DPM中的暴露偏差进行了系统性研究,并发现一个有趣现象:通过我们提出的新型采样方法,无需重新训练模型即可缓解暴露偏差。我们通过实证和理论分析表明,在推理过程中,对于每个反向时间步$t$及其对应状态$\hat{x}_t$,可能存在另一个时间步$t_s$与$\hat{x}_t$呈现更优耦合性。基于此发现,我们提出名为时间移位采样器的采样方法。该框架可无缝集成至DDPM、DDIM及其他高阶求解器等现有采样算法中,仅需极小额外计算量。实验结果表明,我们的方法在不同数据集和采样方法上均能显著且一致地提升FID分数。例如,在CIFAR-10数据集上采用10步采样时,将时间移位采样器集成至F-PNDM后得到FID=3.88,相较F-PNDM提升44.49%,且性能优于采用100步采样的原始DDIM。代码将在论文接收后开源。