Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data is confined to a constrained set as opposed to a standard Euclidean space, these desirable characteristics appear to be lost based on prior attempts. In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. This is achieved by learning diffusion processes in a dual space constructed from a mirror map, which, crucially, is a standard Euclidean space. We derive efficient computation of mirror maps for popular constrained sets, such as simplices and $\ell_2$-balls, showing significantly improved performance of MDM over existing methods. For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information (i.e., watermarks) in generated data, for which MDM serves as a compelling approach. Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains.
翻译:现代扩散模型在学习复杂高维数据分布方面的成功,部分归因于其构建具有解析转移核和得分函数的扩散过程的能力。这种可解性带来了一个无模拟框架及稳定回归损失,使得逆向生成过程可以大规模学习。然而,当数据被约束在受限集合(而非标准欧几里得空间)中时,基于先前的尝试,这些理想特性似乎会丢失。在本文中,我们提出了镜像扩散模型(MDM),这是一类新型扩散模型,能够在凸约束集上生成数据而不损失任何可解性。这是通过在一个由镜像映射构建的对偶空间(该空间关键性地是标准欧几里得空间)中学习扩散过程来实现的。我们推导了常见约束集(如单纯形和ℓ₂球)上镜像映射的高效计算方法,证明了MDM相比现有方法的显著性能提升。出于安全性和隐私性目的,我们还探索了将约束集作为一种新机制,用于在生成数据中嵌入不可见但具有定量性质的信息(即水印),而MDM为此提供了一种令人信服的方法。我们的工作为在复杂域上学习可解扩散过程带来了新的算法机遇。