Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization (PTQ) of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training. Nonetheless, applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. As the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations. Code is available at https://github.com/ziplab/PTQD.
翻译:扩散模型近期主导了图像合成任务。然而,迭代去噪过程在推理阶段计算成本高昂,使得扩散模型难以用于低延迟、可扩展的实际应用。扩散模型的训练后量化(PTQ)可在无需重新训练的情况下显著减小模型尺寸并加速采样过程。然而,直接将现有PTQ方法应用于低位宽扩散模型会严重损害生成样本的质量。具体而言,在每个去噪步骤中,量化噪声会导致估计均值偏离预定方差调度,并产生匹配偏差。随着采样过程进行,量化噪声可能累积,导致后期去噪步骤中信噪比(SNR)降低。为解决这些挑战,我们提出了一种统一框架来建模量化去噪过程中的量化噪声与扩散扰动噪声。具体而言,我们首先将量化噪声分解为与全精度对应部分相关的相关分量和残差非相关分量。相关部分可通过估计相关系数轻松校正。对于非相关分量,我们通过从量化结果中减去偏差来校正均值偏移,并校准去噪方差调度以吸收量化导致的额外方差。此外,我们引入混合精度方案来为每个去噪步骤选择最优位宽。大量实验表明,我们的方法优于现有训练后量化扩散模型:在ImageNet 256x256数据集上,与全精度LDM-4相比,FID分数仅增加0.06,同时节省19.9倍比特操作次数。代码已开源:https://github.com/ziplab/PTQD。