Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new samples (e.g., images). However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories, and employing a large model with numerous parameters across multiple timesteps (i.e., noise levels). To tackle these challenges, we present a multi-stage framework inspired by our empirical findings. These observations indicate the advantages of employing distinct parameters tailored to each timestep while retaining universal parameters shared across all time steps. Our approach involves segmenting the time interval into multiple stages where we employ custom multi-decoder U-net architecture that blends time-dependent models with a universally shared encoder. Our framework enables the efficient distribution of computational resources and mitigates inter-stage interference, which substantially improves training efficiency. Extensive numerical experiments affirm the effectiveness of our framework, showcasing significant training and sampling efficiency enhancements on three state-of-the-art diffusion models, including large-scale latent diffusion models. Furthermore, our ablation studies illustrate the impact of two important components in our framework: (i) a novel timestep clustering algorithm for stage division, and (ii) an innovative multi-decoder U-net architecture, seamlessly integrating universal and customized hyperparameters.
翻译:扩散模型作为强大的深度生成工具,在多种应用中表现卓越。其运行包含两个步骤:向训练样本注入噪声,然后利用模型将随机噪声转化为新样本(如图像)。然而,其卓越的生成性能受限于缓慢的训练与采样过程。这源于需要追踪大量的前向和反向扩散轨迹,并采用包含众多参数的大规模模型处理多个时间步(即噪声水平)。为应对这些挑战,我们基于实证发现提出了一种多阶段框架。观察结果表明,在保留跨时间步通用参数的同时,针对每个时间步采用定制化参数具有明显优势。我们的方法将时间区间分割为多个阶段,采用定制的多解码器U型网络架构,将时间依赖模型与通用共享编码器相结合。该框架能够高效分配计算资源并缓解阶段间干扰,从而显著提升训练效率。大量数值实验验证了本框架的有效性,在三种最先进的扩散模型(包括大规模潜在扩散模型)上展示了训练与采样效率的显著提升。此外,消融研究阐明了本框架两个关键组件的影响:(i)用于阶段划分的新型时间步聚类算法,以及(ii)创新性地集成通用与定制超参数的多解码器U型网络架构。