Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DPM. In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce a sampling method named Time-Shift Sampler. Our framework can be seamlessly integrated to existing sampling algorithms, such as DDPM, DDIM and other high-order solvers, inducing merely minimal additional computations. Experimental results show our method brings significant and consistent improvements in FID scores on different datasets and sampling methods. For example, integrating Time-Shift Sampler to F-PNDM yields a FID=3.88, achieving 44.49\% improvements as compared to F-PNDM, on CIFAR-10 with 10 sampling steps, which is more performant than the vanilla DDIM with 100 sampling steps. Our code is available at https://github.com/Mingxiao-Li/TS-DPM.
翻译:扩散概率模型在合成高质量图像方面展现出显著效能。然而,其推理过程通常需要大量(可能数百个)迭代步骤,这可能导致因训练与推理的不一致性而加剧曝光偏差问题。先前的工作试图通过在训练过程中扰动输入来缓解这一问题,但这必然要求重新训练扩散模型。在本研究中,我们对扩散模型中的曝光偏差进行了系统性研究,并有趣地发现,通过我们提出的一种无需重新训练模型的新型采样方法,曝光偏差可得到缓解。我们从理论和实证上证明,在推理过程中,对于每个反向时间步$t$及其对应的状态$\hat{x}_t$,可能存在另一个时间步$t_s$与$\hat{x}_t$具有更优的耦合性。基于这一发现,我们提出了一种名为时间移位采样器的采样方法。我们的框架可无缝集成到现有采样算法(如DDPM、DDIM及其他高阶求解器)中,且仅增加极少的计算量。实验结果表明,我们的方法在不同数据集和采样方法上均能显著且一致地提升FID分数。例如,在CIFAR-10数据集上,将时间移位采样器集成到F-PNDM中,在10个采样步长下实现了FID=3.88,相比F-PNDM提升了44.49%,其性能甚至优于使用100个采样步长的原始DDIM。我们的代码已开源:https://github.com/Mingxiao-Li/TS-DPM。