This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $W$. We begin by introducing a natural construction of submanifolds $\Lambda$ of probability measures equipped with metric $W_\Lambda$, the geodesic restriction of $W$ to $\Lambda$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. In particular, we show that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$. In addition, we demonstrate how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.
翻译:本文旨在为紧凸子集$\mathbb{R}^d$上的绝对连续概率测度空间(以Wasserstein-2距离$W$度量)中的流形学习算法建立理论基础。我们首先介绍一种概率测度子流形$\Lambda$的自然构造,该子流形配备度量$W_\Lambda$(即$W$在$\Lambda$上的测地线限制)。与其他构造不同,这些子流形不一定平坦,但依然允许以类似于$\mathbb{R}^d$中黎曼子流形的方式进行局部线性化。随后,我们展示如何仅从$\Lambda$的样本$\{\lambda_i\}_{i=1}^N$及成对外部Wasserstein距离$W$中学习$(\Lambda,W_{\Lambda})$的潜在流形结构。特别地,我们证明度量空间$(\Lambda,W_{\Lambda})$可以在Gromov–Wasserstein意义下通过一个以$\{\lambda_i\}_{i=1}^N$为节点、以$W(\lambda_i,\lambda_j)$为边权的图渐近恢复。此外,我们演示了如何通过谱分析合适的“协方差算子”(利用从$\lambda$到足够接近且多样化的样本$\{\lambda_i\}_{i=1}^N$的最优传输映射)渐近恢复样本$\lambda$处的切空间。本文最后给出子流形$\Lambda$的若干显式构造,并通过数值算例展示通过谱分析恢复切空间的过程。