Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion inference. Among training-free acceleration methods, caching reduces computation by reusing previously computed model outputs across timesteps. Existing caching methods rely on heuristic criteria to choose cache/reuse timesteps and require extensive tuning. We address this limitation with a principled sensitivity-aware caching framework. Specifically, we formalize the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, i.e., the noisy latent and the timestep, and show that this sensitivity is a key predictor of caching error. Based on this analysis, we propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis. Our framework provides a theoretical basis for adaptive caching, explains why prior empirical heuristics can be partially effective, and extends them to a dynamic, sample-specific approach. Experiments on Wan 2.1, CogVideoX, and LTX-Video show that SenCache achieves better visual quality than existing caching methods under similar computational budgets.
翻译:扩散模型在视频生成质量上达到最先进水平,但其推理过程因大量顺序去噪步骤而依然昂贵。这推动了关于加速扩散推理的研究日益增多。在无需训练的加速方法中,缓存技术通过跨时间步复用先前计算的模型输出来减少计算量。现有缓存方法依赖启发式准则选择缓存/复用时间步,且需要大量调优。我们通过一种原则性的敏感度感知缓存框架来解决这一局限性。具体而言,我们通过分析模型输出对去噪输入(即含噪潜变量和时间步)扰动的敏感度,形式化了缓存误差,并证明该敏感度是预测缓存误差的关键指标。基于此分析,我们提出敏感度感知缓存(SenCache),这是一种动态缓存策略,能够基于每个样本自适应地选择缓存时间步。我们的框架为自适应缓存提供了理论基础,解释了先前经验性启发式方法为何能部分有效,并将其扩展为动态的、样本特异性的方法。在Wan 2.1、CogVideoX和LTX-Video上的实验表明,在相似计算预算下,SenCache相比现有缓存方法实现了更优的视觉质量。