In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score for few steps image generation on the COCO2014 and COCO2017 datasets, while requiring only several GPU hours of training and fewer trainable parameters than existing methods. In addition to its efficiency, the versatility of the method is also exposed across several tasks such as text-to-image, inpainting, face-swapping, super-resolution and using different backbones such as UNet-based denoisers (SD1.5, SDXL) or DiT (Pixart-$\alpha$), as well as adapters. In all cases, the method allowed to reduce drastically the number of sampling steps while maintaining very high-quality image generation. The official implementation is available at https://github.com/gojasper/flash-diffusion.
翻译:本文提出了一种高效、快速且通用的蒸馏方法——闪速扩散,用于加速预训练扩散模型的生成过程。该方法在COCO2014和COCO2017数据集上实现了少步图像生成的FID和CLIP-Score指标最优性能,同时仅需数小时GPU训练时间,且可训练参数量少于现有方法。除高效性外,该方法在文本到图像、图像修复、人脸替换、超分辨率等多种任务中均展现出卓越的通用性,并兼容UNet去噪器(SD1.5、SDXL)、DiT(Pixart-$\alpha$)等不同骨干网络及适配器。在所有实验场景中,该方法在保持极高图像生成质量的同时,显著降低了采样步数。官方实现代码已发布于https://github.com/gojasper/flash-diffusion。