Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.
翻译:扩散模型(DMs)是一类对图像合成及更广领域产生重大影响的生成模型,在各种生成任务中实现了最先进的生成效果。为了控制生成过程,可采用文本或边界框等多种条件输入。本研究提出一种利用高斯混合模型(GMMs)作为特征条件引导去噪过程的条件机制。基于集合论,我们进行了全面理论分析,表明基于特征与基于类别的条件潜在分布存在显著差异,因此基于特征的条件潜在分布产生的缺陷生成样本少于基于类别的条件。我们分别训练了两个基于高斯混合模型条件化的扩散模型以进行对比,实验结果支持我们的发现。我们提出了一种名为负高斯混合梯度(NGMG)的新型梯度函数,并将其应用于含额外分类器的扩散模型训练中,训练稳定性得到提升。同时,我们从理论上证明,当学习低维流形支持的分布时,NGMG与推土机距离(Wasserstein距离)作为更合理的代价函数具有相同的优势。