Persistent homology is a popular computational tool for detecting non-trivial topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case vanilla persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for most existing refinements of persistent homology. As a remedy, we find that spectral distances on the $k$-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow persistent homology to detect the correct topology even in the presence of high-dimensional noise. Furthermore, we derive a novel closed-form expression for effective resistance in terms of the eigendecomposition of the graph Laplacian, and describe its relation to diffusion distances. Finally, we apply these methods to several high-dimensional single-cell RNA-sequencing datasets and show that spectral distances on the $k$-nearest-neighbor graph allow robust detection of cell cycle loops.
翻译:持久同调是一种流行的计算工具,用于检测点云的非平凡拓扑结构,例如环或空洞的存在。然而,许多内在维度较低的真实世界数据集位于维度高得多的嵌入空间中。我们证明,在这种情况下,原始持久同调对噪声非常敏感,无法检测到正确的拓扑结构。对于大多数现有的持久同调改进方法也是如此。作为补救措施,我们发现在数据的$k$近邻图上的谱距离(例如扩散距离和有效电阻)使持久同调即使在存在高维噪声的情况下也能检测到正确的拓扑结构。此外,我们推导出基于图拉普拉斯算子特征分解的有效电阻的新封闭形式表达式,并描述了其与扩散距离的关系。最后,我们将这些方法应用于多个高维单细胞RNA测序数据集,并证明$k$近邻图上的谱距离能够稳健地检测细胞周期环。