TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising. Usually, $t$ from the finite set $\{1, \ldots, T\}$ is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step $t$ and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by $2.0 \times$ on LSUN-Bedrooms $256 \times 256$ compared to previous works.

翻译：扩散模型作为图像生成的主流框架，因其较长的推理时间和庞大的内存需求，在广泛适用性方面面临重大挑战。高效的后训练量化（PTQ）是解决传统模型中这些问题的关键。与传统模型不同，扩散模型严重依赖时间步$t$来实现令人满意的多轮去噪过程。通常，来自有限集合$\{1, \ldots, T\}$的$t$由少数模块编码为时间特征，这些模块完全独立于采样数据。然而，现有的PTQ方法未能对这些模块进行单独优化，它们采用不合适的重构目标和复杂的校准方法，导致时间特征和去噪轨迹严重紊乱，同时压缩效率低下。为解决这些问题，我们提出了一种基于时间信息块的时间特征维持量化（TFMQ）框架，该时间信息块仅与时间步$t$相关，而与采样数据无关。借助这一开创性的模块设计，我们提出了时间信息感知重构（TIAR）和有限集校准（FSC），以在有限时间内对齐全精度时间特征。通过该框架，我们能够保留大部分时间信息，确保端到端生成质量。在多种数据集和扩散模型上的大量实验证明了我们成果的先进水平。值得注意的是，我们的量化方法首次在4位权重量化下实现了与全精度模型几乎相当的性能。此外，我们的方法几乎不产生额外计算开销，并在LSUN-Bedrooms $256 \times 256$上将量化时间加速了$2.0 \times$。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日