We introduce a Cascaded Diffusion Model (Cas-DM) that improves a Denoising Diffusion Probabilistic Model (DDPM) by effectively incorporating additional metric functions in training. Metric functions such as the LPIPS loss have been proven highly effective in consistency models derived from the score matching. However, for the diffusion counterparts, the methodology and efficacy of adding extra metric functions remain unclear. One major challenge is the mismatch between the noise predicted by a DDPM at each step and the desired clean image that the metric function works well on. To address this problem, we propose Cas-DM, a network architecture that cascades two network modules to effectively apply metric functions to the diffusion model training. The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function. The second cascaded module learns to predict the clean image, thereby facilitating the metric function computation. Experiment results show that the proposed diffusion model backbone enables the effective use of the LPIPS loss, leading to state-of-the-art image quality (FID, sFID, IS) on various established benchmarks.
翻译:我们提出了一种级联扩散模型(Cas-DM),通过有效引入额外的度量函数来改进去噪扩散概率模型(DDPM)的训练过程。诸如LPIPS损失等度量函数已在源于分数匹配的一致性模型中被证明非常有效。然而,对于扩散模型而言,添加额外度量函数的方法与效果尚不明确。其中一个主要挑战是DDPM每一步预测的噪声与度量函数所适用的理想干净图像之间存在失配。为解决这一问题,我们提出了Cas-DM,一种通过级联两个网络模块以将度量函数有效应用于扩散模型训练的网络架构。第一个模块类似于标准DDPM,学习预测添加的噪声,且不受度量函数影响。第二个级联模块学习预测干净图像,从而便于度量函数的计算。实验结果表明,所提出的扩散模型骨干结构能够有效利用LPIPS损失,在多个公认基准上实现了最优的图像质量(FID、sFID、IS)。