降低曝光偏差以增强扩散Transformer特征缓存 (Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching)

Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity. To address this problem, various methods, notably feature caching, have been introduced. However, these approaches focus on aligning non-cache diffusion without analyzing the impact of caching on the generation of intermediate processes. So the lack of exploration provides us with room for analysis and improvement. In this paper, we analyze the impact of caching on the SNR of the diffusion process and discern that feature caching intensifies the denoising procedure, and we further identify this as a more severe exposure bias issue. Drawing on this insight, we introduce EB-Cache, a joint cache strategy that aligns the Non-exposure bias (which gives us a higher performance ceiling) diffusion process. Our approach incorporates a comprehensive understanding of caching mechanisms and offers a novel perspective on leveraging caches to expedite diffusion processes. Empirical results indicate that EB-Cache optimizes model performance while concurrently facilitating acceleration. Specifically, in the 50-step generation process, EB-Cache achieves 1.49$\times$ acceleration with 0.63 FID reduction from 3.69, surpassing prior acceleration methods. Code will be available at \href{https://github.com/aSleepyTree/EB-Cache}{https://github.com/aSleepyTree/EB-Cache}.

翻译：扩散Transformer（DiT）已展现出卓越的生成能力，但其高计算复杂度带来了巨大挑战。为应对此问题，学界已提出多种方法，其中特征缓存尤为突出。然而，现有方法主要关注对齐非缓存扩散过程，而未深入分析缓存机制对中间过程生成的影响。这一研究空白为我们的分析与改进提供了空间。本文分析了缓存对扩散过程信噪比（SNR）的影响，发现特征缓存会加剧去噪过程，并进一步将其界定为更严重的曝光偏差问题。基于此洞见，我们提出EB-Cache——一种联合缓存策略，通过对齐非曝光偏差（其提供更高性能上限）扩散过程实现优化。该方法融合了对缓存机制的全面理解，并为利用缓存加速扩散过程提供了新视角。实验结果表明，EB-Cache在提升模型性能的同时实现了加速效果：在50步生成过程中，EB-Cache达到1.49倍加速，且FID分数从3.69降低0.63，超越了现有加速方法。代码发布于\href{https://github.com/aSleepyTree/EB-Cache}{https://github.com/aSleepyTree/EB-Cache}。