Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new samples (e.g., images). However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories, and employing a large model with numerous parameters across multiple timesteps (i.e., noise levels). To tackle these challenges, we present a multi-stage framework inspired by our empirical findings. These observations indicate the advantages of employing distinct parameters tailored to each timestep while retaining universal parameters shared across all time steps. Our approach involves segmenting the time interval into multiple stages where we employ custom multi-decoder U-net architecture that blends time-dependent models with a universally shared encoder. Our framework enables the efficient distribution of computational resources and mitigates inter-stage interference, which substantially improves training efficiency. Extensive numerical experiments affirm the effectiveness of our framework, showcasing significant training and sampling efficiency enhancements on three state-of-the-art diffusion models, including large-scale latent diffusion models. Furthermore, our ablation studies illustrate the impact of two important components in our framework: (i) a novel timestep clustering algorithm for stage division, and (ii) an innovative multi-decoder U-net architecture, seamlessly integrating universal and customized hyperparameters.

翻译：扩散模型作为强大的深度生成工具，在多种应用中表现卓越。其运行包含两个步骤：向训练样本注入噪声，然后利用模型将随机噪声转化为新样本（如图像）。然而，其卓越的生成性能受限于缓慢的训练与采样过程。这源于需要追踪大量的前向和反向扩散轨迹，并采用包含众多参数的大规模模型处理多个时间步（即噪声水平）。为应对这些挑战，我们基于实证发现提出了一种多阶段框架。观察结果表明，在保留跨时间步通用参数的同时，针对每个时间步采用定制化参数具有明显优势。我们的方法将时间区间分割为多个阶段，采用定制的多解码器U型网络架构，将时间依赖模型与通用共享编码器相结合。该框架能够高效分配计算资源并缓解阶段间干扰，从而显著提升训练效率。大量数值实验验证了本框架的有效性，在三种最先进的扩散模型（包括大规模潜在扩散模型）上展示了训练与采样效率的显著提升。此外，消融研究阐明了本框架两个关键组件的影响：（i）用于阶段划分的新型时间步聚类算法，以及（ii）创新性地集成通用与定制超参数的多解码器U型网络架构。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/