The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64\times$ reduced cost in training its diffusion model on 8x downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from 64x64 to 512x512, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.
翻译:扩散模型在生成高维内容方面表现卓越,但计算成本高昂,尤其在训练阶段。我们提出渐进式扩散自编码器增长方法(PaGoDA),这是一种新颖的流水线,通过三个阶段降低训练成本:在下采样数据上训练扩散模型、蒸馏预训练扩散模型以及渐进式超分辨率重建。采用该流水线后,PaGoDA在8倍下采样数据上训练扩散模型的成本降低了64倍;在推理阶段,其单步生成性能在ImageNet数据集上实现了从64x64到512x64所有分辨率的最先进水平,并在文本到图像生成任务中表现优异。PaGoDA的流水线可直接应用于潜在空间,在潜在扩散模型(如Stable Diffusion)中结合预训练自编码器实现压缩功能。代码已发布于https://github.com/sony/pagoda。