Large-scale text-to-image diffusion models, while powerful, suffer from prohibitive computational cost. Existing one-shot network pruning methods can hardly be directly applied to them due to the iterative denoising nature of diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel one-shot pruning framework that enables accurate and training-free compression of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex architectures of modern diffusion models and supporting diverse pruning granularity, including unstructured, N:M semi-structured, and structured (MHA heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the iterative dynamics of the diffusion process, by examining the problem from an error-accumulation perspective, we propose a novel timestep-aware Hessian construction that incorporates a logarithmic-decrease weighting scheme, assigning greater importance to earlier timesteps to mitigate potential error accumulation; (iii) Furthermore, a computationally efficient group-wise sequential pruning strategy is proposed to amortize the expensive calibration process. Extensive experiments show that OBS-Diff achieves state-of-the-art one-shot pruning for diffusion models, delivering inference acceleration with minimal degradation in visual quality.
翻译:大规模文生图扩散模型虽然功能强大,但存在计算成本过高的问题。由于扩散模型具有迭代去噪的特性,现有的单次网络剪枝方法难以直接应用于此类模型。为弥补这一差距,本文提出了OBS-Diff,一种新颖的单次剪枝框架,能够对大规模文生图扩散模型进行高精度且无需训练的压缩。具体而言:(i)OBS-Diff复兴了经典的Optimal Brain Surgeon(OBS)方法,使其适应现代扩散模型的复杂架构,并支持多样化的剪枝粒度,包括非结构化、N:M半结构化以及结构化(MHA头和FFN神经元)稀疏化;(ii)为使剪枝准则与扩散过程的迭代动态相匹配,我们从误差累积的角度审视该问题,提出了一种新颖的、具有时间步感知能力的Hessian矩阵构建方法,该方法采用对数递减加权方案,赋予早期时间步更大的重要性,以减轻潜在的误差累积;(iii)此外,我们提出了一种计算高效的组级顺序剪枝策略,以分摊昂贵的校准过程成本。大量实验表明,OBS-Diff在扩散模型单次剪枝方面达到了最先进的水平,在实现推理加速的同时,视觉质量退化最小。