We propose using a Gaussian Mixture Model (GMM) as reverse transition operator (kernel) within the Denoising Diffusion Implicit Models (DDIM) framework, which is one of the most widely used approaches for accelerated sampling from pre-trained Denoising Diffusion Probabilistic Models (DDPM). Specifically we match the first and second order central moments of the DDPM forward marginals by constraining the parameters of the GMM. We see that moment matching is sufficient to obtain samples with equal or better quality than the original DDIM with Gaussian kernels. We provide experimental results with unconditional models trained on CelebAHQ and FFHQ and class-conditional models trained on ImageNet datasets respectively. Our results suggest that using the GMM kernel leads to significant improvements in the quality of the generated samples when the number of sampling steps is small, as measured by FID and IS metrics. For example on ImageNet 256x256, using 10 sampling steps, we achieve a FID of 6.94 and IS of 207.85 with a GMM kernel compared to 10.15 and 196.73 respectively with a Gaussian kernel.
翻译:我们提出在去噪扩散隐式模型(DDIM)框架中使用高斯混合模型(GMM)作为反向转移算子(核函数)。DDIM是从预训练去噪扩散概率模型(DDPM)中进行加速采样的最广泛使用的方法之一。具体而言,我们通过约束GMM的参数,来匹配DDPM前向边缘分布的一阶和二阶中心矩。实验表明,矩匹配足以获得与原始高斯核DDIM质量相当或更优的采样结果。我们在CelebAHQ和FFHQ上训练的无条件模型以及ImageNet数据集上训练的类别条件模型上分别进行了实验验证。结果表明,当采样步数较少时,使用GMM核能显著提升生成样本质量(以FID和IS指标衡量)。例如在ImageNet 256×256数据集上,使用10步采样时,GMM核的FID为6.94、IS为207.85,而高斯核对应指标分别为10.15和196.73。