扩散模型内部几何结构探析：基于分数的黎曼度量探索数据流形 (What's Inside Your Diffusion Model? A Score-Based Riemannian Metric to Explore the Data Manifold)

Recent advances in diffusion models have demonstrated their remarkable ability to capture complex image distributions, but the geometric properties of the learned data manifold remain poorly understood. We address this gap by introducing a score-based Riemannian metric that leverages the Stein score function from diffusion models to characterize the intrinsic geometry of the data manifold without requiring explicit parameterization. Our approach defines a metric tensor in the ambient space that stretches distances perpendicular to the manifold while preserving them along tangential directions, effectively creating a geometry where geodesics naturally follow the manifold's contours. We develop efficient algorithms for computing these geodesics and demonstrate their utility for both interpolation between data points and extrapolation beyond the observed data distribution. Through experiments on synthetic data with known geometry, Rotated MNIST, and complex natural images via Stable Diffusion, we show that our score-based geodesics capture meaningful transformations that respect the underlying data distribution. Our method consistently outperforms baseline approaches on perceptual metrics (LPIPS) and distribution-level metrics (FID, KID), producing smoother, more realistic image transitions. These results reveal the implicit geometric structure learned by diffusion models and provide a principled way to navigate the manifold of natural images through the lens of Riemannian geometry.

翻译：近年来，扩散模型在捕捉复杂图像分布方面展现出卓越能力，但其学习到的数据流形的几何特性仍未得到充分理解。本文通过引入一种基于分数的黎曼度量来填补这一空白，该度量利用扩散模型中的斯坦分数函数来刻画数据流形的内在几何结构，无需显式参数化。我们的方法在环境空间中定义了一个度量张量，该张量在垂直于流形的方向上拉伸距离，同时沿切向保持距离不变，从而有效构建了一种使测地线自然遵循流形轮廓的几何结构。我们开发了计算这些测地线的高效算法，并展示了其在数据点间插值以及观测数据分布外推方面的实用性。通过在已知几何结构的合成数据、旋转MNIST数据集以及基于Stable Diffusion的复杂自然图像上进行实验，我们证明基于分数的测地线能够捕捉符合底层数据分布的意义变换。在感知度量（LPIPS）和分布级度量（FID、KID）评估中，我们的方法始终优于基线方法，产生更平滑、更真实的图像过渡效果。这些结果揭示了扩散模型学习到的隐式几何结构，并通过黎曼几何的视角为自然图像流形导航提供了理论依据。