Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DPM. In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce a sampling method named Time-Shift Sampler. Our framework can be seamlessly integrated to existing sampling algorithms, such as DDPM, DDIM and other high-order solvers, inducing merely minimal additional computations. Experimental results show our method brings significant and consistent improvements in FID scores on different datasets and sampling methods. For example, integrating Time-Shift Sampler to F-PNDM yields a FID=3.88, achieving 44.49\% improvements as compared to F-PNDM, on CIFAR-10 with 10 sampling steps, which is more performant than the vanilla DDIM with 100 sampling steps. We will release the code upon acceptance.
翻译:扩散概率模型(DPM)在高品质图像合成中展现出显著成效。然而,其推理过程通常需要大量(可能数百次)迭代步骤,这可能导致训练与推理之间的不一致性,进而加剧暴露偏差问题。以往工作试图通过在训练过程中扰动输入来缓解此问题,但这要求对DPM进行重新训练。本文系统研究了DPM中的暴露偏差,有趣的是,我们发现无需重新训练模型,通过提出的新型采样方法即可缓解暴露偏差。我们通过实证与理论分析表明,在推理阶段,对于每个反向时间步 $t$ 及其对应状态 $\hat{x}_t$,可能存在另一个时间步 $t_s$,其与 $\hat{x}_t$ 具有更优的耦合性。基于此发现,我们提出名为时移采样器(Time-Shift Sampler)的采样方法。该框架可无缝集成至现有采样算法(如DDPM、DDIM及其他高阶求解器),且仅引入极少量额外计算。实验结果表明,我们的方法在不同数据集和采样方法上均能显著且稳定地提升FID分数。例如,将时移采样器集成至F-PNDM后,在CIFAR-10数据集上仅用10个采样步即可实现FID=3.88,相比原始F-PNDM提升44.49%,性能甚至优于使用100个采样步的原始DDIM。代码将在论文接收后开源。