Video diffusion models has been gaining increasing attention for its ability to produce videos that are both coherent and of high fidelity. However, the iterative denoising process makes it computationally intensive and time-consuming, thus limiting its applications. Inspired by the Consistency Model (CM) that distills pretrained image diffusion models to accelerate the sampling with minimal steps and its successful extension Latent Consistency Model (LCM) on conditional image generation, we propose AnimateLCM, allowing for high-fidelity video generation within minimal steps. Instead of directly conducting consistency learning on the raw video dataset, we propose a decoupled consistency learning strategy that decouples the distillation of image generation priors and motion generation priors, which improves the training efficiency and enhance the generation visual quality. Additionally, to enable the combination of plug-and-play adapters in stable diffusion community to achieve various functions (e.g., ControlNet for controllable generation). we propose an efficient strategy to adapt existing adapters to our distilled text-conditioned video consistency model or train adapters from scratch without harming the sampling speed. We validate the proposed strategy in image-conditioned video generation and layout-conditioned video generation, all achieving top-performing results. Experimental results validate the effectiveness of our proposed method. Code and weights will be made public. More details are available at https://github.com/G-U-N/AnimateLCM.
翻译:视频扩散模型因其生成连贯且高保真视频的能力而日益受到关注。然而,迭代去噪过程导致其计算强度大且耗时,从而限制了其应用。受一致性模型(Consistency Model,CM)通过蒸馏预训练图像扩散模型以最少步骤加速采样,以及其成功扩展至条件图像生成的潜在一致性模型(Latent Consistency Model,LCM)的启发,我们提出AnimateLCM,支持在最少步骤内生成高保真视频。我们不直接在原始视频数据集上进行一致性学习,而是提出一种解耦一致性学习策略,将图像生成先验与运动生成先验的蒸馏过程分离,从而提升训练效率并增强生成视觉质量。此外,为结合稳定扩散社区中的即插即用适配器以实现多样化功能(例如,用于可控生成的ControlNet),我们提出一种高效策略,将现有适配器适配至我们蒸馏的文本条件视频一致性模型,或从头训练适配器而不影响采样速度。我们在图像条件视频生成和布局条件视频生成任务中验证了所提策略,均取得最优性能。实验结果证实了我们方法的有效性。代码与权重将公开。更多详情请访问:https://github.com/G-U-N/AnimateLCM。