We introduce LOT Wassmap, a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. The algorithm is motivated by the observation that many datasets are naturally interpreted as probability measures rather than points in $\mathbb{R}^n$, and that finding low-dimensional descriptions of such datasets requires manifold learning algorithms in the Wasserstein space. Most available algorithms are based on computing the pairwise Wasserstein distance matrix, which can be computationally challenging for large datasets in high dimensions. Our algorithm leverages approximation schemes such as Sinkhorn distances and linearized optimal transport to speed-up computations, and in particular, avoids computing a pairwise distance matrix. We provide guarantees on the embedding quality under such approximations, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. Experiments demonstrate that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
翻译:我们提出LOT Wassmap算法,这是一种在Wasserstein空间中揭示低维结构的可计算高效算法。该算法的提出基于以下观察:许多数据集天然应被解释为概率测度而非$\mathbb{R}^n$中的点,而对此类数据集进行低维描述需要Wasserstein空间中的流形学习算法。现有大多数算法需计算成对Wasserstein距离矩阵,这对于高维大规模数据集而言计算负担沉重。本算法通过利用Sinkhorn距离与线性化最优传输等近似方案加速计算,特别避免了成对距离矩阵的构建。我们给出了此类近似下嵌入质量的保证,包括概率测度显式描述不可得而需处理有限样本的情形。实验表明,LOT Wassmap能够获得正确的嵌入结果,且嵌入质量随样本量增大而提升。我们还展示了相比于依赖成对距离计算的算法,LOT Wassmap如何显著降低计算成本。