With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel quantization-aware training approach for DMs, namely BinaryDM. The proposed method pushes DMs' weights toward accurate and efficient binarization, considering the representation and computation properties. From the representation perspective, we present a Learnable Multi-basis Binarizer (LMB) to recover the representations generated by the binarized DM. The LMB enhances detailed information through the flexible combination of dual binary bases while applying to parameter-sparse locations of DM architectures to achieve minor burdens. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Moreover, a quick progressive warm-up is applied to BinaryDM, avoiding convergence difficulties by layerwisely progressive quantization at the beginning of training. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1.1-bit weight and 4-bit activation (W1.1A4), BinaryDM achieves as low as 7.11 FID and saves the performance from collapse (baseline FID 39.69). As the first binarization method for diffusion models, W1.1A4 BinaryDM achieves impressive 9.3 times OPs and 24.8 times model size savings, showcasing its substantial potential for edge deployment.
翻译:随着扩散模型(DMs)的发展及其计算需求的大幅增加,量化成为一种获取紧凑高效的低比特DMs的实用解决方案。然而,高度离散的表示会导致严重的精度下降,阻碍了扩散模型向超低位宽的量化。本文提出了一种新颖的扩散模型量化感知训练方法,即BinaryDM。该方法结合表示与计算特性,将扩散模型的权重推向精确且高效的二值化。从表示角度,我们提出了一种可学习的多基二值化器(LMB),以恢复由二值化扩散模型生成的表示。LMB通过灵活组合双二元基来增强细节信息,同时应用于扩散模型架构中参数稀疏的位置,以实现较小的负担。从优化角度,应用了低秩表示模仿(LRM)来辅助二值化扩散模型的优化。LRM在低秩空间中模仿全精度扩散模型的表示,缓解了由细粒度对齐引起的优化过程方向模糊问题。此外,BinaryDM采用了快速渐进式预热,通过在训练开始时逐层渐进量化,避免了收敛困难。综合实验表明,在超低位宽下,与扩散模型的最先进量化方法相比,BinaryDM在精度和效率上均取得了显著提升。在1.1比特权重和4比特激活(W1.1A4)配置下,BinaryDM实现了低至7.11的FID,并避免了性能崩溃(基线FID为39.69)。作为扩散模型的首个二值化方法,W1.1A4 BinaryDM实现了惊人的9.3倍操作数节省和24.8倍模型大小节省,展示了其在边缘部署方面的巨大潜力。