Persistent homology is one of the most popular methods in Topological Data Analysis. An initial step in any analysis with persistent homology involves constructing a nested sequence of simplicial complexes, called a filtration, from a point cloud. There is an abundance of different complexes to choose from, with Rips, Alpha, and witness complexes being popular choices. In this manuscript, we build a different type of a geometrically-informed simplicial complex, called an ellipsoid complex. This complex is based on the idea that ellipsoids aligned with tangent directions better approximate the data compared to conventional (Euclidean) balls centered at sample points that are used in the construction of Rips and Alpha complexes, for instance. We use Principal Component Analysis to estimate tangent spaces directly from samples and present algorithms as well as an implementation for computing ellipsoid barcodes, i.e., topological descriptors based on ellipsoid complexes. Furthermore, we conduct extensive experiments and compare ellipsoid barcodes with standard Rips barcodes. Our findings indicate that ellipsoid complexes are particularly effective for estimating homology of manifolds and spaces with bottlenecks from samples. In particular, the persistence intervals corresponding to a ground-truth topological feature are longer compared to the intervals obtained when using the Rips complex of the data. Furthermore, ellipsoid barcodes lead to better classification results in sparsely-sampled point clouds. Finally, we demonstrate that ellipsoid barcodes outperform Rips barcodes in classification tasks.
翻译:持续同调是拓扑数据分析中最流行的方法之一。使用持续同调进行分析的初始步骤涉及从点云构建一个称为滤链的嵌套单纯复形序列。存在大量不同的复形可供选择,其中Rips复形、Alpha复形和见证复形是常见选择。在本手稿中,我们构建了一种不同类型的几何感知单纯复形,称为椭球复形。该复形基于以下思想:与用于构建Rips和Alpha复形等传统(欧几里得)球(以采样点为中心)相比,与切方向对齐的椭球能更好地近似数据。我们使用主成分分析直接从样本估计切空间,并提出了计算椭球条形码(即基于椭球复形的拓扑描述符)的算法及实现。此外,我们进行了大量实验,并将椭球条形码与标准Rips条形码进行比较。我们的研究结果表明,椭球复形特别适用于从样本估计流形和具有瓶颈结构的空间的同调。具体而言,与使用数据的Rips复形获得的区间相比,对应真实拓扑特征的持续区间更长。此外,椭球条形码在稀疏采样点云中能带来更好的分类结果。最后,我们证明椭球条形码在分类任务中优于Rips条形码。