Diffusion models have emerged as dominant performers for image generation. To support training large diffusion models, this paper studies pipeline parallel training of diffusion models and proposes DiffusionPipe, a synchronous pipeline training system that advocates innovative pipeline bubble filling technique, catering to structural characteristics of diffusion models. State-of-the-art diffusion models typically include trainable (the backbone) and non-trainable (e.g., frozen input encoders) parts. We first unify optimal stage partitioning and pipeline scheduling of single and multiple backbones in representative diffusion models with a dynamic programming approach. We then propose to fill the computation of non-trainable model parts into idle periods of the pipeline training of the backbones by an efficient greedy algorithm, thus achieving high training throughput. Extensive experiments show that DiffusionPipe can achieve up to 1.41x speedup over pipeline parallel methods and 1.28x speedup over data parallel training on popular diffusion models.
翻译:扩散模型已成为图像生成领域的主导模型。为支持大规模扩散模型的训练,本文研究了扩散模型的流水线并行训练方法,并提出DiffusionPipe——一种同步流水线训练系统。该系统针对扩散模型的结构特性,创新性地提出了流水线气泡填充技术。当前最先进的扩散模型通常包含可训练部分(主干网络)和不可训练部分(例如冻结的输入编码器)。我们首先采用动态规划方法,为代表性扩散模型中单主干网络和多主干网络情形统一实现最优的阶段划分与流水线调度。随后提出通过高效贪心算法,将不可训练模型部分的计算任务填充至主干网络流水线训练的空闲时段,从而显著提升训练吞吐量。大量实验表明,在主流扩散模型上,DiffusionPipe相较于流水线并行方法可实现高达1.41倍的加速,相较于数据并行训练可实现1.28倍的加速。