Recent approaches have yielded promising results in distilling multi-step text-to-image diffusion models into one-step ones. The state-of-the-art efficient distillation technique, i.e., SwiftBrushv2 (SBv2), even surpasses the teacher model's performance with limited resources. However, our study reveals its instability when handling different diffusion model backbones due to using a fixed guidance scale within the Variational Score Distillation (VSD) loss. Another weakness of the existing one-step diffusion models is the missing support for negative prompt guidance, which is crucial in practical image generation. This paper presents SNOOPI, a novel framework designed to address these limitations by enhancing the guidance in one-step diffusion models during both training and inference. First, we effectively enhance training stability through Proper Guidance-SwiftBrush (PG-SB), which employs a random-scale classifier-free guidance approach. By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance. Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images. Our experimental results show that our proposed methods significantly improve baseline models across various metrics. Remarkably, we achieve an HPSv2 score of 31.08, setting a new state-of-the-art benchmark for one-step diffusion models.
翻译:近期研究在多步文本到图像扩散模型蒸馏为一步模型方面取得了有前景的成果。当前最高效的蒸馏技术SwiftBrushv2(SBv2)甚至在有限资源下超越了教师模型的性能。然而,我们的研究发现,由于在变分分数蒸馏(VSD)损失中使用了固定引导尺度,该方法在处理不同扩散模型骨干时存在不稳定性。现有一步扩散模型的另一缺陷是缺乏对负向提示引导的支持,而这在实际图像生成中至关重要。本文提出SNOOPI这一新颖框架,旨在通过增强一步扩散模型在训练和推理阶段的引导机制来解决这些局限性。首先,我们通过恰当引导的SwiftBrush(PG-SB)有效提升训练稳定性,该方法采用随机尺度的无分类器引导策略。通过改变教师模型的引导尺度,我们拓宽了其输出分布,从而获得更稳健的VSD损失,使SB能够在不同骨干网络上有效运行,同时保持竞争优势。其次,我们提出一种无需训练的方法——负向远离注意力引导(NASA),该方法通过交叉注意力机制将负向提示整合到一步扩散模型中,以抑制生成图像中的非期望元素。实验结果表明,我们提出的方法在多项指标上显著提升了基线模型性能。值得注意的是,我们取得了31.08的HPSv2分数,为一步扩散模型设立了新的性能标杆。