We present a novel geometric perspective on the latent space of diffusion models. We first show that the standard pullback approach, utilizing the deterministic probability flow ODE decoder, is fundamentally flawed. It provably forces geodesics to decode as straight segments in data space, effectively ignoring any intrinsic data geometry beyond the ambient Euclidean space. Complementing this view, diffusion also admits a stochastic decoder via the reverse SDE, which enables an information geometric treatment with the Fisher-Rao metric. However, a choice of $x_T$ as the latent representation collapses this metric due to memorylessness. We address this by introducing a latent spacetime $z=(x_t,t)$ that indexes the family of denoising distributions $p(x_0 | x_t)$ across all noise scales, yielding a nontrivial geometric structure. We prove these distributions form an exponential family and derive simulation-free estimators for curve lengths, enabling efficient geodesic computation. The resulting structure induces a principled Diffusion Edit Distance, where geodesics trace minimal sequences of noise and denoise edits between data. We also demonstrate benefits for transition path sampling in molecular systems, including constrained variants such as low-variance transitions and region avoidance. Code is available at: https://github.com/rafalkarczewski/spacetime-geometry.
翻译:本文提出了一种关于扩散模型潜在空间的新颖几何视角。我们首先证明,利用确定性概率流ODE解码器的标准拉回方法存在根本性缺陷。该方法被证明会强制使测地线在数据空间中解码为直线段,从而实质上忽略了除环境欧几里得空间之外的任何固有数据几何结构。作为该视角的补充,扩散过程亦可通过反向SDE接纳随机解码器,这使得能够采用Fisher-Rao度量进行信息几何处理。然而,选择$x_T$作为潜在表示会因无记忆性导致该度量坍缩。我们通过引入一个索引所有噪声尺度下去噪分布族$p(x_0 | x_t)$的潜在时空$z=(x_t,t)$来解决此问题,从而产生一个非平凡的几何结构。我们证明这些分布构成指数族,并推导出曲线长度的无模拟估计器,实现高效的测地线计算。所得结构诱导出一种具有理论依据的扩散编辑距离,其中测地线描绘了数据间噪声与去噪编辑的最小序列。我们还展示了该方法在分子系统过渡路径采样中的优势,包括低方差过渡和区域规避等约束变体。代码发布于:https://github.com/rafalkarczewski/spacetime-geometry。