PTQD: Accurate Post-Training Quantization for Diffusion Models

Diffusion models have recently dominated image synthesis and other related generative tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization of diffusion models can significantly reduce the model size and accelerate the sampling process without requiring any re-training. Nonetheless, applying existing post-training quantization methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. Moreover, as the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) in late denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. We first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we propose a mixed-precision scheme to choose the optimal bitwidth for each denoising step, which prefers low bits to accelerate the early denoising steps while high bits maintain the high SNR for the late steps. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models in generating high-quality samples, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations.

翻译：扩散模型近来主导了图像合成及其他相关生成任务。然而，推理时的迭代去噪过程计算成本高昂，使得扩散模型在低延迟、可扩展的实际应用中实用性较低。对扩散模型进行训练后量化可在无需重新训练的情况下显著减小模型规模并加速采样过程。尽管如此，将现有训练后量化方法直接应用于低位宽扩散模型会严重损害生成样本的质量。具体而言，在每个去噪步骤中，量化噪声会导致均值估计偏差，且与预定的方差调度失配。此外，随着采样过程的推进，量化噪声可能累积，导致后期去噪步骤中信噪比（SNR）降低。为应对这些挑战，我们提出了一种统一公式，用于描述量化去噪过程中的量化噪声与扩散扰动噪声。我们首先将量化噪声相对于其全精度对应部分分解为相关部分和残差非相关部分。相关部分可通过估计相关系数轻松校正。对于非相关部分，我们校准去噪方差调度以吸收量化产生的额外方差。此外，我们提出了一种混合精度方案，为每个去噪步骤选择最优位宽，即早期步骤偏好低位宽以加速去噪，而后期步骤采用高位宽以维持高信噪比。大量实验表明，我们的方法在生成高质量样本方面优于先前的训练后量化扩散模型，在ImageNet 256×256数据集上，相比全精度LDM-4仅FID分数增加了0.06，同时节省了19.9倍的比特运算量。