Euclidean diffusion models have achieved remarkable success in generative modeling across diverse domains, and they have been extended to manifold cases in recent advances. Instead of explicitly utilizing the structure of special manifolds as studied in previous works, in this paper we investigate direct sampling of the Euclidean diffusion models for general manifold-structured data. We reveal the multiscale singularity of the score function in the ambient space, which hinders the accuracy of diffusion-generated samples. We then present an elaborate theoretical analysis of the singularity structure of the score function by decomposing it along the tangential and normal directions of the manifold. To mitigate the singularity and improve the sampling accuracy, we propose two novel methods: (1) Niso-DM, which reduces the scale discrepancies in the score function by utilizing a non-isotropic noise, and (2) Tango-DM, which trains only the tangential component of the score function using a tangential-only loss function. Numerical experiments demonstrate that our methods achieve superior performance on distributions over various manifolds with complex geometries.
翻译:欧几里得扩散模型在跨领域的生成建模中取得了显著成功,并且在近期的进展中已被推广至流形情形。与先前工作中显式利用特殊流形结构的研究不同,本文研究了针对一般流形结构数据直接采样欧几里得扩散模型的方法。我们揭示了环境空间中评分函数的多尺度奇异性,这种奇异性阻碍了扩散生成样本的准确性。随后,我们通过将评分函数沿流形的切向和法向进行分解,对其奇异性结构进行了详尽的理论分析。为了缓解奇异性并提高采样精度,我们提出了两种新颖的方法:(1) Niso-DM,该方法通过利用非各向同性噪声来减小评分函数中的尺度差异;(2) Tango-DM,该方法使用仅切向损失函数仅训练评分函数的切向分量。数值实验表明,我们的方法在具有复杂几何结构的不同流形上的分布均取得了优越的性能。