Diffusion models (DMs) are a powerful type of generative models that have achieved state-of-the-art results in various image synthesis tasks and have shown potential in other domains, such as natural language processing and temporal data modeling. Despite their stable training dynamics and ability to produce diverse high-quality samples, DMs are notorious for requiring significant computational resources, both in the training and inference stages. Previous work has focused mostly on increasing the efficiency of model inference. This paper introduces, for the first time, the paradigm of sparse-to-sparse training to DMs, with the aim of improving both training and inference efficiency. We focus on unconditional generation and train sparse DMs from scratch (Latent Diffusion and ChiroDiff) on six datasets using three different methods (Static-DM, RigL-DM, and MagRan-DM) to study the effect of sparsity in model performance. Our experiments show that sparse DMs are able to match and often outperform their Dense counterparts, while substantially reducing the number of trainable parameters and FLOPs. We also identify safe and effective values to perform sparse-to-sparse training of DMs.
翻译:扩散模型是一类强大的生成模型,已在多种图像合成任务中取得最先进的成果,并在自然语言处理和时间序列建模等其他领域展现出潜力。尽管扩散模型具有稳定的训练动态和生成多样化高质量样本的能力,但其在训练和推理阶段均需要大量计算资源的问题也广为人知。先前的研究主要集中在提升模型推理效率上。本文首次将稀疏到稀疏训练范式引入扩散模型,旨在同时提升训练与推理效率。我们专注于无条件生成任务,使用三种不同方法(Static-DM、RigL-DM 和 MagRan-DM)在六个数据集上从头训练稀疏扩散模型(包括 Latent Diffusion 和 ChiroDiff),以研究稀疏性对模型性能的影响。实验表明,稀疏扩散模型能够达到并经常超越其稠密对应模型的性能,同时显著减少可训练参数数量和浮点运算量。我们还确定了进行扩散模型稀疏到稀疏训练的安全有效参数值。