A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation

Diffusion Models have become a cornerstone of modern generative AI for their exceptional generation quality and controllability. However, their inherent \textit{multi-step iterations} and \textit{complex backbone networks} lead to prohibitive computational overhead and generation latency, forming a major bottleneck for real-time applications. Although existing acceleration techniques have made progress, they still face challenges such as limited applicability, high training costs, or quality degradation. Against this backdrop, \textbf{Diffusion Caching} offers a promising training-free, architecture-agnostic, and efficient inference paradigm. Its core mechanism identifies and reuses intrinsic computational redundancies in the diffusion process. By enabling feature-level cross-step reuse and inter-layer scheduling, it reduces computation without modifying model parameters. This paper systematically reviews the theoretical foundations and evolution of Diffusion Caching and proposes a unified framework for its classification and analysis. Through comparative analysis of representative methods, we show that Diffusion Caching evolves from \textit{static reuse} to \textit{dynamic prediction}. This trend enhances caching flexibility across diverse tasks and enables integration with other acceleration techniques such as sampling optimization and model distillation, paving the way for a unified, efficient inference framework for future multimodal and interactive applications. We argue that this paradigm will become a key enabler of real-time and efficient generative AI, injecting new vitality into both theory and practice of \textit{Efficient Generative Intelligence}.

翻译：扩散模型凭借其卓越的生成质量和可控性，已成为现代生成式人工智能的基石。然而，其固有的**多步迭代**和**复杂骨干网络**导致了极高的计算开销和生成延迟，构成了实时应用的主要瓶颈。尽管现有加速技术已取得进展，但仍面临适用性有限、训练成本高昂或质量下降等挑战。在此背景下，**扩散缓存**提供了一种前景广阔、无需训练、架构无关的高效推理范式。其核心机制在于识别并重用扩散过程中固有的计算冗余。通过实现特征级的跨步重用和层间调度，它能在不修改模型参数的情况下减少计算量。本文系统回顾了扩散缓存的理论基础与发展脉络，并提出了一个统一的分类与分析框架。通过对代表性方法的比较分析，我们揭示了扩散缓存从**静态重用**到**动态预测**的演进趋势。这一趋势增强了缓存机制在不同任务中的灵活性，并使其能够与采样优化、模型蒸馏等其他加速技术相结合，为未来多模态和交互式应用构建统一高效推理框架铺平了道路。我们认为，该范式将成为实现实时高效生成式人工智能的关键推动力，为**高效生成智能**的理论与实践注入新的活力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/