Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the $k$-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.
翻译:持续同调是一种流行的计算工具,用于分析点云数据的拓扑结构,例如环或空洞的存在性。然而,许多内蕴维度较低的真实世界数据集往往存在于维度更高的环境空间中。我们表明,在这种情况下,传统持续同调对噪声极为敏感,无法检测到正确的拓扑结构。现有持续同调改进方法也存在同样问题。作为解决方案,我们发现基于数据$k$近邻图上的谱距离(如扩散距离和有效电阻)即使在存在高维噪声时也能正确检测拓扑结构。此外,我们推导了有效电阻的新闭式表达式,并描述了其与扩散距离的关系。最后,我们将这些方法应用于高维单细胞RNA测序数据,表明谱距离能够稳健地检测细胞周期环路。