Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we prove convergence of the $k$-means clustering algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework. Additionally, we perform numerical experiments on various representations of persistent homology, including embeddings of persistence diagrams as well as diagrams themselves and their generalizations as persistence measures; we find that clustering performance directly on persistence diagrams and measures outperform their vectorized representations.
翻译:持久同源性是拓扑数据分析中的核心方法,通过持续图提取并总结数据集中的拓扑特征;近年来因其在众多领域的成功应用而广受关注。然而,其代数构造导致持续图构成一个具有高度复杂几何结构的度量空间。本文证明了在持续图空间上$k$-均值聚类算法的收敛性,并在Karush–Kuhn–Tucker框架下建立了优化问题解的理论性质。此外,我们对持久同源性的多种表示进行了数值实验,包括持续图的嵌入、持续图本身及其推广为持续测度的形式;实验发现,直接在持续图和持续测度上进行聚类的性能优于其向量化表示。