The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising. Usually, $t$ from the finite set $\{1, \ldots, T\}$ is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step $t$ and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by $2.0 \times$ on LSUN-Bedrooms $256 \times 256$ compared to previous works.
翻译:扩散模型作为图像生成的主流框架,因其较长的推理时间和庞大的内存需求,在广泛适用性方面面临重大挑战。高效的后训练量化(PTQ)是解决传统模型中这些问题的关键。与传统模型不同,扩散模型严重依赖时间步$t$来实现令人满意的多轮去噪过程。通常,来自有限集合$\{1, \ldots, T\}$的$t$由少数模块编码为时间特征,这些模块完全独立于采样数据。然而,现有的PTQ方法未能对这些模块进行单独优化,它们采用不合适的重构目标和复杂的校准方法,导致时间特征和去噪轨迹严重紊乱,同时压缩效率低下。为解决这些问题,我们提出了一种基于时间信息块的时间特征维持量化(TFMQ)框架,该时间信息块仅与时间步$t$相关,而与采样数据无关。借助这一开创性的模块设计,我们提出了时间信息感知重构(TIAR)和有限集校准(FSC),以在有限时间内对齐全精度时间特征。通过该框架,我们能够保留大部分时间信息,确保端到端生成质量。在多种数据集和扩散模型上的大量实验证明了我们成果的先进水平。值得注意的是,我们的量化方法首次在4位权重量化下实现了与全精度模型几乎相当的性能。此外,我们的方法几乎不产生额外计算开销,并在LSUN-Bedrooms $256 \times 256$上将量化时间加速了$2.0 \times$。