Previous work has demonstrated that, in the Variance Preserving (VP) scenario, the nascent Directly Denoising Diffusion Models (DDDM) can generate high-quality images in one step while achieving even better performance in multistep sampling. However, the Pseudo-LPIPS loss used in DDDM leads to concerns about the bias in assessment. Here, we propose a unified DDDM (uDDDM) framework that generates images in one-step/multiple steps for both Variance Preserving (VP) and Variance Exploding (VE) cases. We provide theoretical proofs of the existence and uniqueness of the model's solution paths, as well as the non-intersecting property of the sampling paths. Additionally, we propose an adaptive Pseudo-Huber loss function to balance the convergence to the true solution and the stability of convergence process.Through a comprehensive evaluation, we demonstrate that uDDDMs achieve FID scores comparable to the best-performing methods available for CIFAR-10 in both VP and VE. Specifically, uDDDM achieves one-step generation on CIFAR10 with FID of 2.63 and 2.53 for VE and VP respectively. By extending the sampling to 1000 steps, we further reduce FID score to 1.71 and 1.65 for VE and VP respectively, setting state-of-the-art performance in both cases.
翻译:先前的研究表明,在方差保持(VP)场景下,新兴的直接去噪扩散模型(DDDM)能够在单步生成高质量图像,并在多步采样中实现更优性能。然而,DDDM中使用的伪LPIPS损失函数引发了评估偏差的担忧。本文提出一种统一的DDDM(uDDDM)框架,可同时适用于方差保持(VP)与方差爆炸(VE)场景的单步/多步图像生成。我们从理论上证明了模型解路径的存在性、唯一性以及采样路径的非相交特性。此外,我们提出一种自适应伪Huber损失函数,以平衡对真实解的收敛性与收敛过程的稳定性。通过综合评估,我们证明uDDDM在CIFAR-10数据集上对VP和VE场景均能取得与当前最佳方法相当的FID分数。具体而言,uDDDM在CIFAR-10上实现单步生成的FID分数分别为:VE场景2.63,VP场景2.53。当采样步数扩展至1000步时,FID分数进一步降至VE场景1.71与VP场景1.65,在两种情况下均创造了最先进的性能纪录。