In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. By analyzing existing cache-based methods, we observe that \textit{directly reusing adjacent-step features degrades video quality due to the loss of subtle variations}. We further perform a pioneering investigation of the acceleration potential of classifier-free guidance (CFG) and reveal significant redundancy between conditional and unconditional features within the same timestep. Capitalizing on these observations, we introduce FasterCache to substantially accelerate diffusion-based video generation. Our key contributions include a dynamic feature reuse strategy that preserves both feature distinction and temporal continuity, and CFG-Cache which optimizes the reuse of conditional and unconditional outputs to further enhance inference speed without compromising video quality. We empirically evaluate FasterCache on recent video diffusion models. Experimental results show that FasterCache can significantly accelerate video generation (\eg 1.67$\times$ speedup on Vchitect-2.0) while keeping video quality comparable to the baseline, and consistently outperform existing methods in both inference speed and video quality.
翻译:本文提出了一种新颖的无训练策略 \textbf{\textit{FasterCache}},旨在以高质量生成为目标加速视频扩散模型的推理过程。通过分析现有的基于缓存的方法,我们发现 \textit{直接复用相邻时间步的特征会因丢失细微变化而导致视频质量下降}。我们进一步对无分类器引导(CFG)的加速潜力进行了开创性研究,揭示了同一时间步内条件特征与无条件特征之间存在显著冗余。基于这些观察,我们引入了 FasterCache 以大幅加速基于扩散的视频生成。我们的核心贡献包括:一种能同时保持特征区分度和时序连续性的动态特征复用策略,以及 CFG-Cache——该策略优化了条件输出与无条件输出的复用,从而在不损害视频质量的前提下进一步提升推理速度。我们在最新的视频扩散模型上对 FasterCache 进行了实证评估。实验结果表明,FasterCache 能显著加速视频生成(例如在 Vchitect-2.0 上实现 1.67$\times$ 加速),同时保持与基线模型相当的视频质量,并且在推理速度和视频质量方面均持续优于现有方法。