This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures $\mathcal{P}_{\mathrm{a.c.}}(\Omega)$ with $\Omega$ a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $\mathbb{W}$. We begin by introducing a construction of submanifolds $\Lambda$ in $\mathcal{P}_{\mathrm{a.c.}}(\Omega)$ equipped with metric $\mathbb{W}_\Lambda$, the geodesic restriction of $\mathbb{W}$ to $\Lambda$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(\Lambda,\mathbb{W}_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $\mathbb{W}$ on $\mathcal{P}_{\mathrm{a.c.}}(\Omega)$ only. In particular, we show that the metric space $(\Lambda,\mathbb{W}_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$. In addition, we demonstrate how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable ``covariance operator'' using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.
翻译:本文旨在为绝对连续概率测度空间$\mathcal{P}_{\mathrm{a.c.}}(\Omega)$中的流形学习算法建立理论基础,其中$\Omega$是$\mathbb{R}^d$的紧凸子集,该空间以Wasserstein-2距离$\mathbb{W}$度量。我们首先引入$\mathcal{P}_{\mathrm{a.c.}}(\Omega)$中子流形$\Lambda$的构造,并赋予度量$\mathbb{W}_\Lambda$——即$\mathbb{W}$到$\Lambda$上的测地限制。与其他构造不同,这些子流形不一定是平坦的,但仍能像$\mathbb{R}^d$的黎曼子流形那样允许局部线性化。接着我们证明,仅通过$\Lambda$的样本$\{\lambda_i\}_{i=1}^N$和$\mathcal{P}_{\mathrm{a.c.}}(\Omega)$上的成对外部Wasserstein距离$\mathbb{W}$,即可学习$(\Lambda,\mathbb{W}_{\Lambda})$的潜在流形结构。特别地,我们证明了度量空间$(\Lambda,\mathbb{W}_{\Lambda})$可以从节点为$\{\lambda_i\}_{i=1}^N$、边权为$W(\lambda_i,\lambda_j)$的图中,以Gromov--Wasserstein意义渐近恢复。此外,我们展示了如何通过适当“协方差算子”的谱分析,利用从样本$\lambda$到足够邻近且多样化的样本$\{\lambda_i\}_{i=1}^N$的最优传输映射,渐近恢复该样本$\lambda$处的切空间。本文最后给出了子流形$\Lambda$的一些显式构造,并通过谱分析恢复切空间的数值算例。