SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Recent approaches have yielded promising results in distilling multi-step text-to-image diffusion models into one-step ones. The state-of-the-art efficient distillation technique, i.e., SwiftBrushv2 (SBv2), even surpasses the teacher model's performance with limited resources. However, our study reveals its instability when handling different diffusion model backbones due to using a fixed guidance scale within the Variational Score Distillation (VSD) loss. Another weakness of the existing one-step diffusion models is the missing support for negative prompt guidance, which is crucial in practical image generation. This paper presents SNOOPI, a novel framework designed to address these limitations by enhancing the guidance in one-step diffusion models during both training and inference. First, we effectively enhance training stability through Proper Guidance-SwiftBrush (PG-SB), which employs a random-scale classifier-free guidance approach. By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance. Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images. Our experimental results show that our proposed methods significantly improve baseline models across various metrics. Remarkably, we achieve an HPSv2 score of 31.08, setting a new state-of-the-art benchmark for one-step diffusion models.

翻译：近期研究在多步文本到图像扩散模型蒸馏为一步模型方面取得了显著进展。当前最高效的蒸馏技术——SwiftBrushv2（SBv2）甚至在有限资源下超越了教师模型的性能。然而，本研究发现，由于在变分分数蒸馏（VSD）损失中使用了固定引导尺度，该方法在处理不同扩散模型骨干时存在不稳定性。现有一步扩散模型的另一缺陷是缺乏对负向提示引导的支持，而这在实际图像生成中至关重要。本文提出SNOOPI这一新型框架，旨在通过增强训练与推理阶段的一步扩散模型引导机制来解决上述局限。首先，我们通过采用随机尺度分类器无引导方法的恰当引导SwiftBrush（PG-SB）有效提升了训练稳定性。通过动态调整教师模型的引导尺度，我们拓宽了其输出分布，从而获得更稳健的VSD损失，使SB能够在多样化骨干网络中保持优异性能。其次，我们提出一种无需训练的负向远离注意力引导（NASA）方法，通过交叉注意力机制将负向提示整合到一步扩散模型中，以抑制生成图像中的非期望元素。实验结果表明，所提方法在多项指标上显著提升了基线模型性能。值得注意的是，我们实现了31.08的HPSv2分数，为一步扩散模型确立了新的性能标杆。