Diffusion models have achieved remarkable success in content generation but often incur prohibitive computational costs due to iterative sampling. Recent feature caching methods accelerate inference via temporal extrapolation, yet can suffer quality degradation from inaccurate modeling of the complex dynamics of feature evolution. We propose HiCache (Hermite Polynomial-based Feature Cache), a training-free acceleration framework that improves feature prediction by aligning mathematical tools with empirical properties. Our key insight is that feature-derivative approximations in diffusion Transformers exhibit multivariate Gaussian characteristics, motivating the use of Hermite polynomials as a potentially optimal basis for Gaussian-correlated processes. We further introduce a dual-scaling mechanism that ensures numerical stability while preserving predictive accuracy, and is also effective when applied standalone or integrated with TaylorSeer. Extensive experiments demonstrate HiCache's superiority, achieving 5.55x speedup on FLUX.1-dev while matching or exceeding baseline quality, and maintaining strong performance across text-to-image, video generation, and super-resolution tasks. Moreover, HiCache can be naturally added to previous caching methods to enhance their performance, e.g., improving ClusCa from 0.9480 to 0.9840 in terms of image rewards. Code: https://github.com/fenglang918/HiCache
翻译:扩散模型在内容生成领域取得了显著成功,但由于其迭代采样过程,往往伴随着高昂的计算成本。近期的特征缓存方法通过时间外推来加速推理,然而,由于对特征演化的复杂动力学建模不准确,可能导致生成质量下降。我们提出了HiCache(基于埃尔米特多项式的特征缓存),这是一个无需训练的加速框架,通过将数学工具与经验特性对齐来改进特征预测。我们的核心见解是,扩散Transformer中的特征导数近似表现出多元高斯特性,这促使我们使用埃尔米特多项式作为高斯相关过程的潜在最优基。我们进一步引入了一种双缩放机制,在保持预测精度的同时确保数值稳定性,并且该机制在单独使用或与TaylorSeer集成时同样有效。大量实验证明了HiCache的优越性,在FLUX.1-dev上实现了5.55倍的加速,同时达到或超过了基线质量,并在文本到图像、视频生成和超分辨率任务中保持了强劲的性能。此外,HiCache可以自然地添加到先前的缓存方法中以提升其性能,例如,将ClusCa的图像奖励从0.9480提高到0.9840。代码:https://github.com/fenglang918/HiCache