Diffusion Models Are Innate One-Step Generators

Diffusion Models (DMs) have achieved great success in image generation and other fields. By fine sampling through the trajectory defined by the SDE/ODE solver based on a well-trained score model, DMs can generate remarkable high-quality results. However, this precise sampling often requires multiple steps and is computationally demanding. To address this problem, instance-based distillation methods have been proposed to distill a one-step generator from a DM by having a simpler student model mimic a more complex teacher model. Yet, our research reveals an inherent limitations in these methods: the teacher model, with more steps and more parameters, occupies different local minima compared to the student model, leading to suboptimal performance when the student model attempts to replicate the teacher. To avoid this problem, we introduce a novel distributional distillation method, which uses an exclusive distributional loss. This method exceeds state-of-the-art (SOTA) results while requiring significantly fewer training images. Additionally, we show that DMs' layers are activated differently at different time steps, leading to an inherent capability to generate images in a single step. Freezing most of the convolutional layers in a DM during distributional distillation leads to further performance improvements. Our method achieves the SOTA results on CIFAR-10 (FID 1.54), AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency. Most of those results are obtained with only 5 million training images within 6 hours on 8 A100 GPUs. This breakthrough not only enhances the understanding of efficient image generation models but also offers a scalable framework for advancing the state of the art in various applications.

翻译：扩散模型在图像生成及其他领域取得了巨大成功。通过基于训练良好的评分模型，利用SDE/ODE求解器定义的轨迹进行精细采样，扩散模型能够生成卓越的高质量结果。然而，这种精确采样通常需要多步计算且计算成本高昂。为解决此问题，研究者提出了基于实例的蒸馏方法，通过让更简单的学生模型模仿更复杂的教师模型，从扩散模型中蒸馏出单步生成器。但我们的研究揭示了这些方法的内在局限性：具有更多步骤和参数的教师模型与学生模型占据不同的局部极小值，导致学生模型尝试复制教师模型时性能欠佳。为避免此问题，我们提出了一种新颖的分布蒸馏方法，该方法采用独特的分布损失函数。该方法在显著减少训练图像需求的同时，超越了现有最佳性能。此外，我们发现扩散模型的各层在不同时间步具有差异化的激活模式，这使其天生具备单步生成图像的能力。在分布蒸馏过程中冻结扩散模型的大部分卷积层可带来进一步的性能提升。我们的方法在CIFAR-10（FID 1.54）、AFHQv2 64×64（FID 1.23）、FFHQ 64×64（FID 0.85）和ImageNet 64×64（FID 1.16）数据集上以极高效率实现了最佳性能。其中多数结果仅需500万训练图像，并在8块A100 GPU上6小时内完成训练。这一突破不仅深化了对高效图像生成模型的理解，更为推进各类应用的前沿研究提供了可扩展的框架。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日