We introduce the Fixed Point Diffusion Model (FPDM), a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling. Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems. Combined with a new stochastic training method, this approach significantly reduces model size, reduces memory usage, and accelerates training. Moreover, it enables the development of two new techniques to improve sampling efficiency: reallocating computation across timesteps and reusing fixed point solutions between timesteps. We conduct extensive experiments with state-of-the-art models on ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, demonstrating substantial improvements in performance and efficiency. Compared to the state-of-the-art DiT model, FPDM contains 87% fewer parameters, consumes 60% less memory during training, and improves image generation quality in situations where sampling computation or time is limited. Our code and pretrained models are available at https://lukemelas.github.io/fixed-point-diffusion-models.
翻译:我们提出固定点扩散模型(FPDM),一种将固定点求解概念融入扩散生成建模框架的图像生成新方法。该方法在扩散模型的去噪网络中嵌入隐式固定点求解层,将扩散过程转化为一系列紧密相关的固定点问题。结合新的随机训练方法,该方法显著缩小模型尺寸、降低内存占用并加速训练。此外,该方法还开创了两项提升采样效率的新技术:跨时间步计算重分配和时间步间固定点解复用。我们在ImageNet、FFHQ、CelebA-HQ和LSUN-Church数据集上基于最先进模型开展广泛实验,证实其在性能与效率上的显著提升。与当前最优的DiT模型相比,FPDM参数减少87%,训练内存消耗降低60%,且在采样计算资源或时间受限时生成质量更优。我们的代码与预训练模型已开源至https://lukemelas.github.io/fixed-point-diffusion-models。