Diffusion models are powerful deep generative models, but unlike classical models, they lack an explicit low-dimensional latent space that parameterizes the data manifold. This absence makes it difficult to perform manifold-aware operations, such as geometrically faithful interpolation or conditional guidance that respects the learned manifold. We propose a training-free Riemannian metric on the noise space, derived from the Jacobian of the score function. The key insight is that the spectral structure of this Jacobian separates tangent and normal directions of the data manifold; our metric leverages this separation to encourage paths to stay tangential to the manifold rather than drift toward high-density regions. To validate that our metric faithfully captures the manifold geometry, we examine it from two complementary angles. First, geodesics under our metric yield perceptually more natural interpolations than existing methods on synthetic, image, and video frame datasets. Second, the tangent-normal decomposition induced by our metric prevents classifier-free guidance from deviating off the manifold, improving generation quality while preserving text-image alignment.
翻译:扩散模型是强大的深度生成模型,但与经典模型不同,它们缺乏参数化数据流形的显式低维潜在空间。这一缺失使得执行流形感知操作(如保持几何保真度的插值或遵循学习流形的条件引导)变得困难。我们提出噪声空间上的免训练黎曼度量,该度量源自分数函数的雅可比矩阵。核心洞察在于:该雅可比矩阵的谱结构可分离数据流形的切向与法向;我们的度量利用这一分离特性,引导路径沿流形切向演进而非漂移至高密度区域。为验证所提度量忠实刻画流形几何结构,我们从两个互补角度进行检验。首先,在合成、图像和视频帧数据集上,该度量下的测地线比现有方法能产生感知更自然的插值结果。其次,由该度量诱导的切法分解可防止无分类器引导偏离流形,在保持文本-图像对齐的同时提升生成质量。