Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we prove convergence of the $k$-means clustering algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework. Additionally, we perform numerical experiments on various representations of persistent homology, including embeddings of persistence diagrams as well as diagrams themselves and their generalizations as persistence measures; we find that clustering performance directly on persistence diagrams and measures outperform their vectorized representations.
翻译:持久同源性是拓扑数据分析中的核心方法,它通过持久性图提取并总结数据集内的拓扑特征;近年来因其在众多领域的成功应用而广受关注。然而,其代数构造使得持久性图构成的度量空间具有高度复杂的几何结构。本文证明了持久性图空间上$k$-均值聚类算法的收敛性,并在Karush--Kuhn--Tucker框架下建立了该优化问题解的理论性质。此外,我们对持久同源性的多种表示进行了数值实验,包括持久性图的嵌入、持久性图本身及其推广——持久性测度——的研究。结果表明,直接在持久性图和测度上进行聚类,其性能优于它们的向量化表示。