Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: https://atedm.github.io.
翻译:扩散模型(DMs)在生成高质量多样化图像方面展现出卓越性能。然而,这一卓越表现以昂贵的架构设计为代价,尤其是由于领先模型中大量使用的注意力模块。现有工作主要通过重训练过程来提升DM效率,该方法计算成本高且可扩展性有限。为此,我们提出注意力驱动的无需训练的高效扩散模型(AT-EDM)框架,该框架利用注意力图对冗余令牌进行运行时剪枝,无需任何重训练。具体而言,针对单步去噪剪枝,我们提出新型排序算法——广义加权网页排序(G-WPR)以识别冗余令牌,并设计基于相似度的恢复方法来恢复卷积操作所需的令牌。此外,我们提出去噪步长感知剪枝(DSAP)方法,动态调整不同去噪时间步的剪枝预算以提升生成质量。大量评估表明,AT-EDM在效率方面优于现有技术(如相比Stable Diffusion XL可节约38.8% FLOPs并实现1.53倍加速),同时FID和CLIP分数与完整模型几乎持平。项目主页:https://atedm.github.io。