Diffusion models have emerged as state-of-the-art generative models for high-fidelity image synthesis, particularly in their classifier-free guided and classifier-guided forms. However, standard classifier guidance concentrates probability mass around high-density class mean, leading to poor coverage of rare samples in the tails of the class-conditional distributions. Recent work on diffusion-based tail sampling mitigates this by training an additional low-density-seeking classifier with a synthetic-vs-real discriminator, at the cost of additional networks and training. In parallel, a number of samplers and distillation techniques accelerate or refine diffusion sampling, but do not explicitly address long-tail coverage. We propose a purely sampling-time, density-aware extension of classifier-guided conditional diffusion model that targets low-density regions without any additional training. We have applied guidance at noisy images not on predicted noise like most diffusion models. Starting from a pretrained conditional diffusion model and classifier on ImageNet, we modify the guided reverse dynamics by steering trajectories toward low-confidence regions via the modified classifier gradient, and at each time step, we also guide the sampling process toward the predicted real image. 1st guidance helps explore low-probability samples, and 2nd guidance helps to generate samples to be close to the real data manifold. The proposed sampler consistently improves ADM model recall at 64x64 resolution while maintaining a comparable FID, and with a 256x256 ADM model, we showed the results visually with different combinations of both guidance. We also showed that standard ADM classifier guidance, combined with predicted real image guidance, helps generate high perceptual quality samples with a 256x256 ADM model on ImageNet.
翻译:扩散模型已成为高保真图像合成的最新生成模型,尤其是在无分类器引导和分类器引导的变体中。然而,标准分类器引导将概率质量集中在高密度类均值周围,导致对类条件分布尾部稀有样本的覆盖不足。近期关于扩散尾部采样的研究通过训练一个额外的低密度搜索分类器并引入合成-真实判别器来缓解这一问题,但代价是增加了额外的网络和训练。与此同时,多种采样器和蒸馏技术加速或优化了扩散采样过程,但未明确解决长尾覆盖问题。我们提出了一种纯采样时刻、密度感知的分类器引导条件扩散模型扩展方法,无需额外训练即可针对低密度区域。与大多数扩散模型对预测噪声施加引导不同,我们直接对带噪图像施加引导。基于ImageNet上的预训练条件扩散模型和分类器,我们通过修正分类器梯度将轨迹引导至低置信区域以修改引导反向动力学,并在每个时间步将采样过程导向预测的真实图像。第一重引导有助于探索低概率样本,第二重引导有助于生成贴近真实数据流形的样本。所提出的采样器在64×64分辨率下持续提升ADM模型的召回率,同时保持相当的FID值;在256×256分辨率下,我们通过不同引导组合直观展示了结果。我们还证明了标准ADM分类器引导与预测真实图像引导相结合,有助于在ImageNet上使用256×256 ADM模型生成高感知质量的样本。