Score distillation sampling has been pivotal for integrating diffusion models into generation of complex visuals. Despite impressive results it suffers from mode collapse and lack of diversity. To cope with this challenge, we leverage the gradient flow interpretation of score distillation to propose Repulsive Score Distillation (RSD). In particular, we propose a variational framework based on repulsion of an ensemble of particles that promotes diversity. Using a variational approximation that incorporates a coupling among particles, the repulsion appears as a simple regularization that allows interaction of particles based on their relative pairwise similarity, measured e.g., via radial basis kernels. We design RSD for both unconstrained and constrained sampling scenarios. For constrained sampling we focus on inverse problems in the latent space that leads to an augmented variational formulation, that strikes a good balance between compute, quality and diversity. Our extensive experiments for text-to-image generation, and inverse problems demonstrate that RSD achieves a superior trade-off between diversity and quality compared with state-of-the-art alternatives.
翻译:分数蒸馏采样在将扩散模型整合至复杂视觉生成任务中发挥了关键作用。尽管取得了令人瞩目的成果,该方法仍面临模式崩溃与多样性不足的挑战。为应对这一问题,我们基于分数蒸馏的梯度流理论解释,提出了排斥性分数蒸馏方法。具体而言,我们构建了一个基于粒子系综互斥作用的变分框架以增强生成多样性。通过引入粒子间耦合关系的变分近似,排斥效应表现为一种简洁的正则化项,该正则化允许粒子基于径向基核等相似度度量进行交互。我们针对无约束采样与约束采样两种场景设计了排斥性分数蒸馏方案。对于约束采样,我们聚焦于隐空间中的逆问题求解,由此推导出增强型变分框架,在计算效率、生成质量与多样性之间实现了良好平衡。在文本到图像生成及逆问题求解的大量实验中,排斥性分数蒸馏相较于现有先进方法,在多样性与质量权衡方面展现出显著优势。