Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.
翻译:潜一致性模型(LCMs)在加速文本到图像生成任务中取得了令人瞩目的性能,能够以极少的推理步骤生成高质量图像。LCMs通过从预训练的潜扩散模型(LDMs)中蒸馏得到,仅需约32个A100 GPU小时的训练时长。本报告进一步从两个方面拓展了LCMs的潜力:第一,通过对Stable-Diffusion模型(包括SD-V1.5、SSD-1B和SDXL)应用LoRA蒸馏技术,我们将LCM的应用范围扩展到更大规模的模型,同时显著降低内存消耗,实现了更优的图像生成质量。第二,我们发现通过LCM蒸馏获得的LoRA参数可作为一种通用的Stable-Diffusion加速模块,命名为LCM-LoRA。LCM-LoRA可直接插入多种经过微调的Stable-Diffusion模型或LoRA中而无需额外训练,从而成为适用于各类图像生成任务的通用加速器。与先前基于数值方法的PF-ODE求解器(如DDIM、DPM-Solver)相比,LCM-LoRA可视为一种具有强大泛化能力的即插式神经PF-ODE求解器。项目页面:https://github.com/luosiallen/latent-consistency-model。