Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster convergence during training, as the model has less information to re-learn, thereby addressing the high computational cost of training. Consequently, our approach achieves a compressed model that offers improved inference speed and reduced parameter count, while maintaining minimal performance degradation. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG). Notably, we reduce the inference time of Stable Diffusion (SD) by 34.9% while simultaneously improving its FID by 5.2% on MS-COCO T2I benchmark. This work paves the way for more efficient pruning methods for LDMs, enhancing their applicability.
翻译:潜在扩散模型(LDMs)已成为强大的生成模型,以在有限计算资源下取得卓越成果而闻名。然而,在资源受限设备上部署LDMs仍是一个复杂问题,面临内存消耗和推理速度等挑战。为解决此问题,我们提出LD-Pruner——一种新颖的保性能结构化剪枝方法,用于压缩LDMs。传统深度神经网络剪枝方法未针对LDMs的独特特性进行定制,例如训练的高计算成本、缺乏快速直接且任务无关的模型性能评估方法。本方法通过利用剪枝过程中的潜在空间来应对这些挑战,从而能够独立于具体任务有效量化剪枝对模型性能的影响。这种对输出影响最小的部件进行针对性剪枝的策略,使模型在训练中可通过减少需重新学习的信息实现更快收敛,从而应对训练的高计算成本。最终,本方法实现了压缩模型,在保持最小性能降级的同时,提升了推理速度并减少了参数量。我们在三项不同任务上验证了方法的有效性:文本到图像生成、无条件图像生成和无条件音频生成。值得注意的是,在MS-COCO文本到图像生成基准上,我们将Stable Diffusion的推理时间缩短34.9%,同时将其FID提升5.2%。本工作为LDMs更高效剪枝方法开辟了道路,增强了其适用性。