The Emergence of Reproducibility and Consistency in Diffusion Models

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and scoring function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions affected by the training data size. This is supported by the fact that the model reproducibility manifests in two distinct training regimes: (i) "memorization regime", where the diffusion model overfits to the training data distribution, and (ii) "generalization regime", where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning. Finally, our work raises numerous intriguing theoretical questions for future investigation and highlights practical implications regarding training efficiency, model privacy, and the controlled generation of diffusion models.

翻译：本文研究扩散模型中一个引人注目且普遍存在的现象，我们称之为“一致的模型再现性”：给定相同的初始噪声输入和确定性采样器，不同扩散模型通常会产生高度相似的输出结果。通过全面实验验证，该现象表明无论扩散模型框架、模型架构或训练流程如何，不同扩散模型始终能收敛到相同的数据分布与评分函数。更值得注意的是，进一步研究揭示扩散模型实际上在学习受训练数据规模影响的差异化分布。这一结论得到两个不同训练阶段的支持：（i）“记忆阶段”，扩散模型过拟合训练数据分布；（ii）“泛化阶段”，模型学习潜在数据分布。本研究还发现，这一重要特性可推广至多种扩散模型变体，包括条件生成模型、逆问题求解模型及微调模型。最后，本研究提出了诸多值得未来探索的理论问题，并揭示了在训练效率、模型隐私以及扩散模型可控生成方面的实践启示。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/