Q&C：当量化遇上缓存的高效图像生成 (Q&C: When Quantization Meets Cache in Efficient Image Generation)

Quantization and cache mechanisms are typically applied individually for efficient Diffusion Transformers (DiTs), each demonstrating notable potential for acceleration. However, the promoting effect of combining the two mechanisms on efficient generation remains under-explored. Through empirical investigation, we find that the combination of quantization and cache mechanisms for DiT is not straightforward, and two key challenges lead to severe catastrophic performance degradation: (i) the sample efficacy of calibration datasets in post-training quantization (PTQ) is significantly eliminated by cache operation; (ii) the combination of the above mechanisms introduces more severe exposure bias within sampling distribution, resulting in amplified error accumulation in the image generation process. In this work, we take advantage of these two acceleration mechanisms and propose a hybrid acceleration method by tackling the above challenges, aiming to further improve the efficiency of DiTs while maintaining excellent generation capability. Concretely, a temporal-aware parallel clustering (TAP) is designed to dynamically improve the sample selection efficacy for the calibration within PTQ for different diffusion steps. A variance compensation (VC) strategy is derived to correct the sampling distribution. It mitigates exposure bias through an adaptive correction factor generation. Extensive experiments have shown that our method has accelerated DiTs by 12.7x while preserving competitive generation capability. The code will be available at https://github.com/xinding-sys/Quant-Cache.

翻译：量化与缓存机制通常被独立应用于高效扩散Transformer（DiT）中，各自展现出显著的加速潜力。然而，这两种机制结合对高效生成的促进作用仍未得到充分探索。通过实证研究，我们发现将量化与缓存机制结合应用于DiT并非易事，两个关键挑战导致严重的灾难性性能退化：（i）训练后量化（PTQ）中校准数据集的样本有效性被缓存操作显著削弱；（ii）上述机制的结合在采样分布中引入了更严重的暴露偏差，导致图像生成过程中误差累积被放大。在本工作中，我们利用这两种加速机制，通过解决上述挑战提出一种混合加速方法，旨在进一步提升DiT的效率同时保持优异的生成能力。具体而言，我们设计了时序感知并行聚类（TAP）方法，以动态提升不同扩散步骤中PTQ校准的样本选择效率；推导出方差补偿（VC）策略来校正采样分布，通过自适应校正因子生成缓解暴露偏差。大量实验表明，我们的方法在保持竞争力的生成能力的同时，将DiT加速了12.7倍。代码将在 https://github.com/xinding-sys/Quant-Cache 公开。