Diffusion models have achieved remarkable success in image generation tasks, yet their practical deployment is restrained by the high memory and time consumption. While quantization paves a way for diffusion model compression and acceleration, existing methods totally fail when the models are quantized to low-bits. In this paper, we unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is been made publicly available.
翻译:扩散模型在图像生成任务中取得了显著成功,但其实际部署受限于高内存消耗与高耗时。尽管量化为扩散模型的压缩与加速提供了途径,现有方法在模型被量化至低比特时完全失效。本文揭示了量化扩散模型中损害现有方法有效性的三个特性:激活分布不平衡、时间信息不精确以及特定模块对扰动的脆弱性。为缓解由分布不平衡加剧的低比特量化难题,我们提出对量化模型进行微调以更好地适应激活分布。基于此思路,我们识别出两类关键量化层:一类承载重要时间信息,另一类对比特宽度缩减敏感,通过对它们进行高效微调来减轻性能退化。实验证实,我们的方法能够修正激活分布并提供有意义的时间信息,从而促进更简便、更精确的量化。该方法在三个高分辨率图像生成任务上进行了评估,在多种比特宽度设置下均达到最先进性能,同时也是首个在完整4比特(即W4A4)Stable Diffusion上生成可读图像的方法。代码已公开发布。