Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new samples (e.g., images). However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories, and employing a large model with numerous parameters across multiple timesteps (i.e., noise levels). To tackle these challenges, we present a multi-stage framework inspired by our empirical findings. These observations indicate the advantages of employing distinct parameters tailored to each timestep while retaining universal parameters shared across all time steps. Our approach involves segmenting the time interval into multiple stages where we employ custom multi-decoder U-net architecture that blends time-dependent models with a universally shared encoder. Our framework enables the efficient distribution of computational resources and mitigates inter-stage interference, which substantially improves training efficiency. Extensive numerical experiments affirm the effectiveness of our framework, showcasing significant training and sampling efficiency enhancements on three state-of-the-art diffusion models, including large-scale latent diffusion models. Furthermore, our ablation studies illustrate the impact of two important components in our framework: (i) a novel timestep clustering algorithm for stage division, and (ii) an innovative multi-decoder U-net architecture, seamlessly integrating universal and customized hyperparameters.

翻译：扩散模型作为强大的深度生成工具，在多种应用中表现卓越。其运行机制包含两个步骤：首先向训练样本引入噪声，随后利用模型将随机噪声转化为新样本（如图像）。然而，其卓越的生成性能受限于缓慢的训练与采样速度。这源于模型需要追踪大量前向与反向扩散轨迹，并在多个时间步（即噪声水平）上使用参数量庞大的模型。为应对这些挑战，我们基于实证研究提出一种多阶段框架。实验观察表明，为每个时间步配备定制化参数，同时保留跨所有时间步的通用参数具有显著优势。我们的方法将时间区间划分为多个阶段，在每个阶段采用定制的多解码器U-net架构，该架构将时间依赖模型与全局共享编码器相结合。该框架能够高效分配计算资源并减轻阶段间干扰，从而显著提升训练效率。大量数值实验验证了我们框架的有效性，在包括大规模潜在扩散模型在内的三种先进扩散模型上实现了显著的训练与采样效率提升。此外，消融研究阐明了框架中两个关键组件的作用：（i）用于阶段划分的新型时间步聚类算法；（ii）创新性的多解码器U-net架构，该架构无缝整合了通用与定制化的超参数。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/