The Emergence of Reproducibility and Consistency in Diffusion Models

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and scoring function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions affected by the training data size. This is supported by the fact that the model reproducibility manifests in two distinct training regimes: (i) "memorization regime", where the diffusion model overfits to the training data distribution, and (ii) "generalization regime", where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning. Finally, our work raises numerous intriguing theoretical questions for future investigation and highlights practical implications regarding training efficiency, model privacy, and the controlled generation of diffusion models.

翻译：在本工作中，我们研究了扩散模型一个引人入胜且普遍存在的现象，并将其称为“一致模型可复现性”：给定相同的起始噪声输入和确定性采样器，不同扩散模型往往产生高度相似的输出。通过全面实验，我们证实了这一现象，表明无论扩散模型框架、模型架构或训练过程如何，不同扩散模型始终收敛于相同的数据分布和评分函数。更引人注目的是，进一步研究发现扩散模型实际上在学习受训练数据规模影响的差异化分布。这一结论得到以下事实的支持：模型可复现性体现在两种截然不同的训练范式中：(i) “记忆范式”——扩散模型过拟合训练数据分布，以及(ii) “泛化范式”——模型学习潜在数据分布。我们的研究还发现，这一有价值的特性可推广至多种扩散模型变体，包括条件生成模型、逆问题求解模型及微调模型。最后，本工作为未来研究提出了诸多耐人寻味的理论问题，并揭示了关于训练效率、模型隐私以及扩散模型可控生成方面的实践意义。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/