H2-Cache：一种用于高性能生成扩散模型加速的新型层次化双阶段缓存 (H2-Cache: A Novel Hierarchical Dual-Stage Cache for High-Performance Acceleration of Generative Diffusion Models)

Diffusion models have emerged as state-of-the-art in image generation, but their practical deployment is hindered by the significant computational cost of their iterative denoising process. While existing caching techniques can accelerate inference, they often create a challenging trade-off between speed and fidelity, suffering from quality degradation and high computational overhead. To address these limitations, we introduce H2-Cache, a novel hierarchical caching mechanism designed for modern generative diffusion model architectures. Our method is founded on the key insight that the denoising process can be functionally separated into a structure-defining stage and a detail-refining stage. H2-cache leverages this by employing a dual-threshold system, using independent thresholds to selectively cache each stage. To ensure the efficiency of our dual-check approach, we introduce pooled feature summarization (PFS), a lightweight technique for robust and fast similarity estimation. Extensive experiments on the Flux architecture demonstrate that H2-cache achieves significant acceleration (up to 5.08x) while maintaining image quality nearly identical to the baseline, quantitatively and qualitatively outperforming existing caching methods. Our work presents a robust and practical solution that effectively resolves the speed-quality dilemma, significantly lowering the barrier for the real-world application of high-fidelity diffusion models. Source code is available at https://github.com/Bluear7878/H2-cache-A-Hierarchical-Dual-Stage-Cache.

翻译：扩散模型已成为图像生成领域的最先进技术，但其迭代去噪过程的高计算成本阻碍了实际部署。现有的缓存技术虽能加速推理，但往往在速度与保真度之间形成难以权衡的取舍，存在质量下降和计算开销高的问题。为应对这些局限，我们提出了H2-Cache，一种专为现代生成扩散模型架构设计的层次化缓存机制。该方法基于一个关键洞见：去噪过程在功能上可分离为结构定义阶段和细节细化阶段。H2-Cache利用这一特性，采用双阈值系统，通过独立阈值对每个阶段进行选择性缓存。为确保双校验方法的效率，我们引入了池化特征摘要（PFS），这是一种轻量级技术，用于实现鲁棒且快速的相似性估计。在Flux架构上的大量实验表明，H2-Cache实现了显著加速（最高达5.08倍），同时保持与基线几乎一致的图像质量，在定量和定性上均优于现有缓存方法。我们的工作提供了一个鲁棒且实用的解决方案，有效解决了速度与质量的权衡困境，显著降低了高保真扩散模型在实际应用中的门槛。源代码发布于https://github.com/Bluear7878/H2-cache-A-Hierarchical-Dual-Stage-Cache。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日