Diffusion models have emerged as preeminent contenders in the realm of generative models. Distinguished by their distinctive sequential generative processes, characterized by hundreds or even thousands of timesteps, diffusion models progressively reconstruct images from pure Gaussian noise, with each timestep necessitating full inference of the entire model. However, the substantial computational demands inherent to these models present challenges for deployment, quantization is thus widely used to lower the bit-width for reducing the storage and computing overheads. Current quantization methodologies primarily focus on model-side optimization, disregarding the temporal dimension, such as the length of the timestep sequence, thereby allowing redundant timesteps to continue consuming computational resources, leaving substantial scope for accelerating the generative process. In this paper, we introduce TMPQ-DM, which jointly optimizes timestep reduction and quantization to achieve a superior performance-efficiency trade-off, addressing both temporal and model optimization aspects. For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process, thereby mitigating the explosive combinations of timesteps. In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance, thus rectifying performance degradation observed in prior studies. To expedite the evaluation of fine-grained quantization, we further devise a super-network to serve as a precision solver by leveraging shared quantization results. These two design components are seamlessly integrated within our framework, enabling rapid joint exploration of the exponentially large decision space via a gradient-free evolutionary search algorithm.
翻译:扩散模型已成为生成模型领域的杰出竞争者。其独特的顺序生成过程需要数百甚至数千个时间步,每个时间步均需对整个模型进行完整推理,从而逐步从纯高斯噪声中重建图像。然而,这类模型固有的巨大计算需求给部署带来了挑战,因此量化被广泛用于降低位宽,以减少存储和计算开销。现有量化方法主要关注模型端优化,忽视了时间维度(如时间步序列长度),导致冗余时间步持续消耗计算资源,为加速生成过程留下了充分空间。本文提出TMPQ-DM,通过联合优化时间步缩减与量化实现性能-效率的优越权衡,同时解决时间维度与模型优化问题。在时间步缩减方面,我们针对去噪过程的非均匀特性设计了一种非均匀分组方案,从而缓解时间步的组合爆炸问题。在量化方面,我们采用细粒度逐层方法,根据各层对最终生成性能的贡献分配不同位宽,以此修正先前研究中观察到的性能退化。为加速细粒度量化的评估,我们进一步设计超网络作为精度求解器,通过共享量化结果实现高效评估。这两个设计组件被无缝集成到我们的框架中,通过无梯度进化搜索算法实现指数级庞大决策空间的快速联合探索。