TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising. Usually, $t$ from the finite set $\{1, \ldots, T\}$ is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step $t$ and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by $2.0 \times$ on LSUN-Bedrooms $256 \times 256$ compared to previous works. Our code is publicly available at https://github.com/ModelTC/TFMQ-DM.

翻译：扩散模型作为图像生成的主流框架，因其推理时间较长且内存需求较大，在广泛应用中面临重大挑战。高效的后训练量化（PTQ）是解决传统模型中这些问题的关键。与传统模型不同，扩散模型严重依赖于时间步$t$来实现令人满意的多轮去噪。通常，来自有限集合$\{1, \ldots, T\}$的时间步$t$会通过一些与采样数据完全无关的模块编码为时间特征。然而，现有PTQ方法并未单独优化这些模块。它们采用不合适的重建目标和复杂的标定方法，导致时间特征和去噪轨迹受到严重干扰，同时压缩效率较低。为解决这些问题，我们提出了一种基于时间信息块的时间特征保持量化（TFMQ）框架，该时间信息块仅与时间步$t$相关，而与采样数据无关。凭借这一开创性的块设计，我们提出了时间信息感知重建（TIAR）和有限集标定（FSC）方法，以在有限时间内对齐全精度时间特征。借助该框架，我们能够保留大部分时间信息，并确保端到端生成质量。在多种数据集和扩散模型上进行的大量实验证明了我们的方法达到了最先进水平。值得注意的是，我们的量化方法首次在4比特权重量化下实现了几乎与全精度模型相当的性能。此外，我们的方法几乎不产生额外计算开销，并在LSUN-Bedrooms $256 \times 256$数据集上将量化时间相比先前工作加速了$2.0 \times$。我们的代码开源在：https://github.com/ModelTC/TFMQ-DM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日