SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models

from arxiv, Table 1. shows the comparison with existing methods, but the lack of experimental data of the LCM method under 12-step makes the table incomplete. We need to temporarily withdraw the manuscript and conduct corresponding experiments before resubmitting it

Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Though many acceleration methods have been proposed, they suffer from generation quality degradation or extra training cost generalizing to new fine-tuned models. To address these limitations, we propose a novel and universal Stable-Diffusion (SD) acceleration module called SpeedUpNet(SUN). SUN can be directly plugged into various fine-tuned SD models without extra training. This technique utilizes cross-attention layers to learn the relative offsets in the generated image results between negative and positive prompts achieving classifier-free guidance distillation with negative prompts controllable, and introduces a Multi-Step Consistency (MSC) loss to ensure a harmonious balance between reducing inference steps and maintaining consistency in the generated output. Consequently, SUN significantly reduces the number of inference steps to just 4 steps and eliminates the need for classifier-free guidance. It leads to an overall speedup of more than 10 times for SD models compared to the state-of-the-art 25-step DPM-solver++, and offers two extra advantages: (1) classifier-free guidance distillation with controllable negative prompts and (2) seamless integration into various fine-tuned Stable-Diffusion models without training. The effectiveness of the SUN has been verified through extensive experimentation. Project Page: https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io

翻译：文本到图像扩散模型（SD）在取得显著进展的同时，也需要大量的计算资源。尽管已有许多加速方法被提出，但它们往往面临生成质量下降或需要额外训练成本才能泛化到新微调模型的问题。为解决这些限制，我们提出了一种新颖且通用的稳定扩散（SD）加速模块，称为SpeedUpNet（SUN）。SUN可直接插入各类微调后的SD模型，无需额外训练。该技术利用交叉注意力层学习正负提示词在生成图像结果中的相对偏移，实现了具有负提示词可控性的无分类器引导蒸馏，并引入多步一致性（MSC）损失，在减少推理步数与保持生成输出一致性之间实现和谐平衡。因此，SUN将推理步数显著减少至仅4步，并消除了无分类器引导的需求。与目前最先进的25步DPM-solver++相比，它使SD模型的整体加速超过10倍，并带来两个额外优势：（1）具有可控负提示词的无分类器引导蒸馏；（2）无需训练即可无缝集成到各类微调后的稳定扩散模型中。通过大量实验验证了SUN的有效性。项目页面：https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io