Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data is confined to a constrained set as opposed to a standard Euclidean space, these desirable characteristics appear to be lost based on prior attempts. In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. This is achieved by learning diffusion processes in a dual space constructed from a mirror map, which, crucially, is a standard Euclidean space. We derive efficient computation of mirror maps for popular constrained sets, such as simplices and $\ell_2$-balls, showing significantly improved performance of MDM over existing methods. For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information (i.e., watermarks) in generated data, for which MDM serves as a compelling approach. Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains. Our code is available at https://github.com/ghliu/mdm
翻译:扩散模型在学习复杂高维数据分布方面的现代成功,部分归因于其构建具有解析转移核与得分函数的扩散过程的能力。这种可解性带来了一个无模拟框架与稳定回归损失,从而能够大规模学习逆向生成过程。然而,当数据被限制在约束集(而非标准欧几里得空间)时,基于先前尝试,这些理想特性似乎会丧失。本文提出镜像扩散模型(MDM),这是一种能够在凸约束集上生成数据且不损失任何可解性的全新扩散模型类别。其关键在于通过镜像映射构建对偶空间并在其中学习扩散过程——该对偶空间本质上是标准欧几里得空间。我们推导了常见约束集(如单纯形和$\ell_2$球)镜像映射的高效计算方法,证明MDM相比现有方法具有显著性能提升。出于安全与隐私目的,我们还探索将约束集作为嵌入生成数据中不可见但可量化信息(即水印)的新机制,而MDM为此提供了极具说服力的实现方案。本研究为在复杂域上学习可解扩散过程开辟了新的算法机遇。我们的代码已开源至https://github.com/ghliu/mdm