BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient by improving the representation and optimization. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. EBB enhances information representation in the initial stage through the flexible combination of multiple binary bases and applies regularization to evolve into efficient single-basis binarization. The evolution only occurs in the head and tail of the DM architecture to retain the stability of training. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), BinaryDM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87). As the first binarization method for diffusion models, W1A4 BinaryDM achieves impressive 15.2x OPs and 29.2x model size savings, showcasing its substantial potential for edge deployment.

翻译：随着扩散模型（DMs）的不断发展及其计算需求的大幅增加，量化成为一种获取紧凑高效的低比特扩散模型的实用解决方案。然而，高度离散的表示会导致严重的精度下降，阻碍了将扩散模型量化至极低比特宽度。本文提出了一种新颖的扩散模型权重二值化方法，即BinaryDM，通过改进表示和优化，推动二值化扩散模型实现高精度与高效率。从表示的角度，我们提出了一种可进化基二值化器（Evolvable-Basis Binarizer, EBB），使扩散模型能够从全精度平滑地演化为精确的二值化模型。EBB在初始阶段通过多个二进制基的灵活组合增强信息表示，并应用正则化以演化为高效的单基二值化。该演化仅发生在扩散模型架构的头部和尾部，以保持训练的稳定性。从优化的角度，我们采用了低秩表示模仿（Low-rank Representation Mimicking, LRM）来辅助二值化扩散模型的优化。LRM在低秩空间中模仿全精度扩散模型的表示，缓解了由细粒度对齐引起的优化过程方向模糊性问题。综合实验表明，在极低比特宽度下，与扩散模型的最先进量化方法相比，BinaryDM在精度和效率上均取得了显著提升。在权重1比特、激活4比特（W1A4）的配置下，BinaryDM实现了低至7.74的FID分数，并避免了性能崩溃（基线FID为10.87）。作为首个针对扩散模型的二值化方法，W1A4 BinaryDM实现了令人印象深刻的15.2倍运算量节省和29.2倍模型大小节省，展示了其在边缘部署方面的巨大潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日