Learning the distribution of data on Riemannian manifolds is crucial for modeling data from non-Euclidean space, which is required by many applications in diverse scientific fields. Yet, existing generative models on manifolds suffer from expensive divergence computation or rely on approximations of heat kernel. These limitations restrict their applicability to simple geometries and hinder scalability to high dimensions. In this work, we introduce the Riemannian Diffusion Mixture, a principled framework for building a generative diffusion process on manifolds. Instead of following the denoising approach of previous diffusion models, we construct a diffusion process using a mixture of bridge processes derived on general manifolds without requiring heat kernel estimations. We develop a geometric understanding of the mixture process, deriving the drift as a weighted mean of tangent directions to the data points that guides the process toward the data distribution. We further propose a scalable training objective for learning the mixture process that readily applies to general manifolds. Our method achieves superior performance on diverse manifolds with dramatically reduced number of in-training simulation steps for general manifolds.
翻译:学习黎曼流形上的数据分布对于建模非欧几里得空间中的数据至关重要,这是众多科学领域应用所必需的基础。然而,现有的流形生成模型或需计算昂贵的散度,或依赖于热核近似。这些限制使其仅适用于简单几何结构,并难以扩展至高维空间。本文提出黎曼扩散混合模型,这是一个在流形上构建生成扩散过程的理论框架。不同于先前扩散模型的去噪思路,我们通过在一般流形上构建混合桥过程来构造扩散过程,无需进行热核估计。我们从几何角度深入理解了该混合过程,推导出漂移项作为数据点切空间方向的加权平均,从而引导过程逼近数据分布。进一步,我们提出了一种可扩展的训练目标用于学习混合过程,该目标可直接应用于一般流形。我们的方法在多种流形上实现了优越性能,同时对一般流形的训练模拟步数需求显著降低。