We introduce ORC-ManL, a new algorithm to prune spurious edges from nearest neighbor graphs using a criterion based on Ollivier-Ricci curvature and estimated metric distortion. Our motivation comes from manifold learning: we show that when the data generating the nearest-neighbor graph consists of noisy samples from a low-dimensional manifold, edges that shortcut through the ambient space have more negative Ollivier-Ricci curvature than edges that lie along the data manifold. We demonstrate that our method outperforms alternative pruning methods and that it significantly improves performance on many downstream geometric data analysis tasks that use nearest neighbor graphs as input. Specifically, we evaluate on manifold learning, persistent homology, dimension estimation, and others. We also show that ORC-ManL can be used to improve clustering and manifold learning of single-cell RNA sequencing data. Finally, we provide empirical convergence experiments that support our theoretical findings.
翻译:我们提出ORC-ManL算法,该算法基于Ollivier-Ricci曲率和估计的度量失真准则,从最近邻图中剪除虚假边。我们的动机源于流形学习:当生成最近邻图的数据来自低维流形的噪声样本时,穿越环境空间的捷径边比沿数据流形延伸的边具有更负的Ollivier-Ricci曲率。实验证明,我们的方法优于其他剪枝方法,并显著提升了使用最近邻图作为输入的多种下游几何数据分析任务的性能。具体而言,我们在流形学习、持续同调、维度估计等任务上进行了评估。同时,我们展示了ORC-ManL可用于改进单细胞RNA测序数据的聚类和流形学习。最后,我们提供了支持理论发现的经验收敛实验。