Diffusion models achieve remarkable performance across diverse generative tasks in computer vision, but their high computational cost remains a major barrier to deployment. Model pruning offers a promising way to reduce inference cost and enable lightweight models. However, pruning leads to quality drop due to reduced capacity. A key limitation of existing pruning approaches is that pruned models are finetuned using the same objective as the dense model (denoising score matching). Since the dense model is accessible during finetuning, it warrants a more effective approach for knowledge transfer from the dense to the pruned model. Motivated by this, we propose \textbf{2ndMatch} (\textbf{2ndM}), a general-purpose finetuning framework that introduces a \textbf{2nd}-order Jacobian ($J^{\top} J$) \textbf{M}atching loss inspired by Finite-Time Lyapunov Exponents. \textbf{2ndM} teaches the pruned model to mimic the sensitivity of the dense teacher, i.e., how to respond to small perturbations over time, through scalable random projections. The framework is architecture-agnostic and applies to both U-Net- and Transformer-based diffusion models. Experiments on CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO demonstrate that \textbf{2ndM} reduces the performance gap between pruned and dense models, substantially improving output quality.
翻译:扩散模型在计算机视觉的各类生成任务中展现了卓越性能,但其高昂的计算成本仍是部署的主要障碍。模型剪枝提供了一种降低推理成本、实现轻量级模型的有效途径。然而,由于模型容量缩减,剪枝会导致生成质量下降。现有剪枝方法的关键局限在于,剪枝后的模型仍采用与稠密模型相同的目标(去噪分数匹配)进行微调。鉴于微调过程中可访问稠密模型,这为从稠密模型向剪枝模型的知识迁移提供了更高效的方法。受此启发,我们提出**2ndMatch**(**2ndM**),一种通用的微调框架,该框架引入受有限时间李雅普诺夫指数启发的**二**阶雅可比($J^{\top} J$)**匹**配损失函数。**2ndM**通过可扩展的随机投影,使剪枝模型模仿稠密教师模型的敏感性,即随时间推移对微小扰动的响应方式。该框架与架构无关,同时适用于基于U-Net和基于Transformer的扩散模型。在CIFAR-10、CelebA、LSUN、ImageNet和MSCOCO数据集上的实验表明,**2ndM**缩小了剪枝模型与稠密模型之间的性能差距,显著提升了输出质量。