We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.
翻译:我们提出了一种基于SDXL的扩散蒸馏方法,在一/少步1024像素文本到图像生成中取得了新的最佳性能。该方法结合了渐进式蒸馏与对抗性蒸馏,在生成质量与模式覆盖之间实现了平衡。本文讨论了理论分析、判别器设计、模型公式化及训练技术。我们以LoRA和完整UNet权重的形式开源了蒸馏后的SDXL-Lightning模型。