We present in this paper a novel post-training quantization (PTQ) method, dubbed AccuQuant, for diffusion models. We show analytically and empirically that quantization errors for diffusion models are accumulated over denoising steps in a sampling process. To alleviate the error accumulation problem, AccuQuant minimizes the discrepancies between outputs of a full-precision diffusion model and its quantized version within a couple of denoising steps. That is, it simulates multiple denoising steps of a diffusion sampling process explicitly for quantization, accounting the accumulated errors over multiple denoising steps, which is in contrast to previous approaches to imitating a training process of diffusion models, namely, minimizing the discrepancies independently for each step. We also present an efficient implementation technique for AccuQuant, together with a novel objective, which reduces a memory complexity significantly from $\mathcal{O}(n)$ to $\mathcal{O}(1)$, where $n$ is the number of denoising steps. We demonstrate the efficacy and efficiency of AccuQuant across various tasks and diffusion models on standard benchmarks.
翻译:本文提出了一种新颖的扩散模型后训练量化方法,称为AccuQuant。我们从理论和实验上证明,扩散模型的量化误差会在采样过程中的多个去噪步骤中不断累积。为缓解误差累积问题,AccuQuant在连续多个去噪步骤中最小化全精度扩散模型与其量化版本输出之间的差异。该方法显式地模拟扩散采样过程中的多个去噪步骤进行量化,将多步去噪过程中的累积误差纳入考量,这与先前模仿扩散模型训练过程的方法(即独立最小化每个步骤的差异)形成鲜明对比。我们还提出了AccuQuant的高效实现技术及新型优化目标,将内存复杂度从$\mathcal{O}(n)$显著降低至$\mathcal{O}(1)$,其中$n$表示去噪步骤数。通过在标准基准测试中对不同任务和扩散模型进行验证,我们证明了AccuQuant的有效性与高效性。