Denoising Probabilistic Models (DPMs) represent an emerging domain of generative models that excel in generating diverse and high-quality images. However, most current training methods for DPMs often neglect the correlation between timesteps, limiting the model's performance in generating images effectively. Notably, we theoretically point out that this issue can be caused by the cumulative estimation gap between the predicted and the actual trajectory. To minimize that gap, we propose a novel \textit{sequence-aware} loss that aims to reduce the estimation gap to enhance the sampling quality. Furthermore, we theoretically show that our proposed loss function is a tighter upper bound of the estimation loss in comparison with the conventional loss in DPMs. Experimental results on several benchmark datasets including CIFAR10, CelebA, and CelebA-HQ consistently show a remarkable improvement of our proposed method regarding the image generalization quality measured by FID and Inception Score compared to several DPM baselines. Our code and pre-trained checkpoints are available at \url{https://github.com/viettmab/SA-DPM}.
翻译:去噪概率模型(DPMs)作为生成模型的新兴领域,在生成多样化高质量图像方面表现优异。然而,当前大多数DPMs训练方法常忽略时间步之间的相关性,限制了模型有效生成图像的性能。我们通过理论分析指出,该问题源于预测轨迹与实际轨迹之间的累积估计差距。为最小化这一差距,我们提出了一种新颖的序列感知损失函数,旨在缩小估计差距以提升采样质量。进一步的理论证明表明,与DPMs的传统损失相比,所提出的损失函数是估计损失更紧的上界。在CIFAR10、CelebA和CelebA-HQ等多个基准数据集上的实验结果显示,相较于多种DPM基线方法,我们的方法在通过FID和Inception Score衡量的图像泛化质量上取得了显著提升。相关代码与预训练检查点已开源至 \url{https://github.com/viettmab/SA-DPM}。