Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. We use this Nested Diffusion approach to peek into the generation process and enable flexible scheduling based on the instantaneous preference of the user. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final slow generation result remains comparable.
翻译:扩散模型是当前图像生成领域的最新技术,通过将生成过程分解为许多精细的去噪步骤来合成高质量图像。尽管性能优异,但扩散模型计算成本高昂,需要大量神经函数评估(NFEs)。在本工作中,我们提出了一种基于扩散的任意时刻方法,可在完成前任意时刻终止时生成可行图像。利用现有预训练扩散模型,我们展示了生成方案可重构为两个嵌套扩散过程,从而实现对生成图像的快速迭代优化。我们利用这种嵌套扩散方法窥探生成过程,并基于用户即时偏好实现灵活调度。在ImageNet和基于稳定扩散(Stable Diffusion)的文本到图像生成实验中,我们从定性和定量两方面证明,本方法的中间生成质量显著优于原始扩散模型,而最终慢速生成结果仍保持可比性。