High-dimensional data are often modeled as lying near a low-dimensional manifold. We study how to construct diffusion processes on this data manifold in the implicit setting. That is, using only point cloud samples and without access to charts, projections, or other geometric primitives. Our main contribution is a data-driven SDE that captures intrinsic diffusion on the underlying manifold while being defined in ambient space. The construction relies on estimating the diffusion's infinitesimal generator and its carré-du-champ (CDC) from a proximity graph built from the data. The generator and CDC together encode the local stochastic and geometric structure of the intended diffusion. We show that, as the number of samples grows, the induced process converges in law on the space of probability paths to its smooth manifold counterpart. We call this construction Implicit Manifold-valued Diffusions (IMDs), and furthermore present a numerical simulation procedure using Euler-Maruyama integration. This gives a rigorous basis for practical implementations of diffusion dynamics on data manifolds, and opens new directions for manifold-aware sampling, exploration, and generative modeling.
翻译:高维数据常被建模为位于低维流形附近。本文研究了在隐式设定下如何在该数据流形上构建扩散过程——即仅利用点云样本,无需访问坐标图、投影或其他几何基元。我们的主要贡献在于提出一种数据驱动的随机微分方程,该方程在环境空间中捕捉底层流形上的内蕴扩散过程。该构建依赖于从数据构建的近邻图估计扩散的无穷小生成元及其卡雷·迪·尚(CDC)算子。生成元与CDC共同编码了目标扩散的局部随机与几何结构。我们证明,随着样本数量增长,诱导过程在概率路径空间上依分布收敛至光滑流形对应版本。我们将此构建称为隐式流形值扩散(IMDs),并提出一种基于欧拉-丸山积分的数值模拟流程。上述工作为数据流形上扩散动力学的实际实现提供了严格理论基础,并为流形感知的采样、探索及生成建模开辟了新方向。