The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising. Usually, $t$ from the finite set $\{1, \ldots, T\}$ is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step $t$ and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by $2.0 \times$ on LSUN-Bedrooms $256 \times 256$ compared to previous works. Our code is publicly available at https://github.com/ModelTC/TFMQ-DM.
翻译:扩散模型作为图像生成的主流框架,因其推理时间较长且内存需求较大,在广泛应用中面临重大挑战。高效的后训练量化(PTQ)是解决传统模型中这些问题的关键。与传统模型不同,扩散模型严重依赖于时间步$t$来实现令人满意的多轮去噪。通常,来自有限集合$\{1, \ldots, T\}$的时间步$t$会通过一些与采样数据完全无关的模块编码为时间特征。然而,现有PTQ方法并未单独优化这些模块。它们采用不合适的重建目标和复杂的标定方法,导致时间特征和去噪轨迹受到严重干扰,同时压缩效率较低。为解决这些问题,我们提出了一种基于时间信息块的时间特征保持量化(TFMQ)框架,该时间信息块仅与时间步$t$相关,而与采样数据无关。凭借这一开创性的块设计,我们提出了时间信息感知重建(TIAR)和有限集标定(FSC)方法,以在有限时间内对齐全精度时间特征。借助该框架,我们能够保留大部分时间信息,并确保端到端生成质量。在多种数据集和扩散模型上进行的大量实验证明了我们的方法达到了最先进水平。值得注意的是,我们的量化方法首次在4比特权重量化下实现了几乎与全精度模型相当的性能。此外,我们的方法几乎不产生额外计算开销,并在LSUN-Bedrooms $256 \times 256$数据集上将量化时间相比先前工作加速了$2.0 \times$。我们的代码开源在:https://github.com/ModelTC/TFMQ-DM。