We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.
翻译:我们提出了一种扩散蒸馏方法,基于SDXL在一步/少步1024px文本到图像生成中实现了新的最优性能。该方法结合了渐进式蒸馏与对抗蒸馏,在质量与模式覆盖之间取得了平衡。本文讨论了理论分析、判别器设计、模型公式化以及训练技术。我们以LoRA和完整UNet权重的形式开源了蒸馏后的SDXL-Lightning模型。