We consider the problem of computing persistent homology (PH) for large-scale Euclidean point cloud data, aimed at downstream machine learning tasks, where the exponential growth of the most widely-used Vietoris-Rips complex imposes serious computational limitations. Although more scalable alternatives such as the Alpha complex or sparse Rips approximations exist, they often still result in a prohibitively large number of simplices. This poses challenges in the complex construction and in the subsequent PH computation, prohibiting their use on large-scale point clouds. To mitigate these issues, we introduce the Flood complex, inspired by the advantages of the Alpha and Witness complex constructions. Informally, at a given filtration value $r\geq 0$, the Flood complex contains all simplices from a Delaunay triangulation of a small subset of the point cloud $X$ that are fully covered by balls of radius $r$ emanating from $X$, a process we call flooding. Our construction allows for efficient PH computation, possesses several desirable theoretical properties, and is amenable to GPU parallelization. Scaling experiments on 3D point cloud data show that we can compute PH of up to dimension 2 on several millions of points. Importantly, when evaluating object classification performance on real-world and synthetic data, we provide evidence that this scaling capability is needed, especially if objects are geometrically or topologically complex, yielding performance superior to other PH-based methods and neural networks for point cloud data. Source code and datasets are available on https://github.com/plus-rkwitt/flooder.
翻译:本文研究面向下游机器学习任务的大规模欧几里得点云数据的持续同调计算问题。当前最广泛使用的Vietoris-Rips复形因其指数级规模增长而带来严重的计算限制。尽管存在更具可扩展性的替代方案(如Alpha复形或稀疏Rips近似),但这些方法通常仍会产生数量巨大的单纯形,这给复形构建及后续持续同调计算带来挑战,使其难以应用于大规模点云数据。为缓解这些问题,我们受Alpha复形与Witness复形构造优势的启发,提出了洪水复形。直观而言,在给定过滤值$r\geq 0$时,洪水复形包含点云$X$的某个小子集的Delaunay三角剖分中所有被$X$发出的半径为$r$的球体完全覆盖的单纯形,这一过程我们称之为"淹没"。我们的构造支持高效的持续同调计算,具备若干理想的理论特性,并适用于GPU并行化。在三维点云数据上的扩展实验表明,我们能够在数百万个点上计算高达2维的持续同调。重要的是,通过在真实世界和合成数据上进行物体分类性能评估,我们证明这种扩展能力是必要的——尤其当物体具有复杂几何或拓扑结构时,该方法性能优于其他基于持续同调的方法及针对点云数据的神经网络。源代码与数据集可在https://github.com/plus-rkwitt/flooder获取。