Manifold learning in Wasserstein space

This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $W$. We begin by introducing a natural construction of submanifolds $\Lambda$ of probability measures equipped with metric $W_\Lambda$, the geodesic restriction of $W$ to $\Lambda$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. In particular, we show that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$. In addition, we demonstrate how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.

翻译：本文旨在为紧凸子集$\mathbb{R}^d$上的绝对连续概率测度空间（以Wasserstein-2距离$W$度量）中的流形学习算法建立理论基础。我们首先介绍一种概率测度子流形$\Lambda$的自然构造，该子流形配备度量$W_\Lambda$（即$W$在$\Lambda$上的测地线限制）。与其他构造不同，这些子流形不一定平坦，但依然允许以类似于$\mathbb{R}^d$中黎曼子流形的方式进行局部线性化。随后，我们展示如何仅从$\Lambda$的样本$\{\lambda_i\}_{i=1}^N$及成对外部Wasserstein距离$W$中学习$(\Lambda,W_{\Lambda})$的潜在流形结构。特别地，我们证明度量空间$(\Lambda,W_{\Lambda})$可以在Gromov–Wasserstein意义下通过一个以$\{\lambda_i\}_{i=1}^N$为节点、以$W(\lambda_i,\lambda_j)$为边权的图渐近恢复。此外，我们演示了如何通过谱分析合适的“协方差算子”（利用从$\lambda$到足够接近且多样化的样本$\{\lambda_i\}_{i=1}^N$的最优传输映射）渐近恢复样本$\lambda$处的切空间。本文最后给出子流形$\Lambda$的若干显式构造，并通过数值算例展示通过谱分析恢复切空间的过程。

相关内容

流形学习

关注 345

流形学习，全称流形学习方法(Manifold Learning)，自2000年在著名的科学杂志《Science》被首次提出以来，已成为信息科学领域的研究热点。在理论和应用上，流形学习方法都具有重要的研究意义。假设数据是均匀采样于一个高维欧氏空间中的低维流形，流形学习就是从高维采样数据中恢复低维流形结构，即找到高维空间中的低维流形，并求出相应的嵌入映射，以实现维数约简或者数据可视化。它是从观测到的现象中去寻找事物的本质，找到产生数据的内在规律。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日