Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io/
翻译:潜扩散模型(LDM)已在合成高分辨率图像方面取得显著成果。然而,其迭代采样过程计算密集且导致生成速度缓慢。受一致性模型(Song等人)启发,我们提出潜一致性模型(LCM),能够在任意预训练LDM(包括Stable Diffusion,Rombach等人)上以最少步骤实现快速推断。通过将引导式反向扩散过程视为求解增广概率流常微分方程(PF-ODE),LCM被设计为直接在潜空间中预测此类ODE的解,从而减少大量迭代需求,实现快速高保真采样。基于预训练无分类器引导扩散模型的高效蒸馏,一个高质量768×768的2~4步LCM仅需32个A100 GPU小时进行训练。此外,我们引入潜一致性微调(LCF),这是一种专为在定制化图像数据集上微调LCM而设计的新方法。在LAION-5B-Aesthetics数据集上的评估表明,LCM通过少步推断实现了最先进的文生图生成性能。项目页面:https://latent-consistency-models.github.io/