We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark datasets, showing that it identifies multi-scale cluster structure in data.
翻译:我们考虑拓扑数据分析中的度-Rips构造,该构造提供了一种密度敏感的多参数层次聚类算法。我们使用对应交错距离(本文引入的一种层次聚类度量指标)分析其对输入数据扰动的稳定性。对度-Rips取特定单参数切片可恢复经典的密度聚类方法,但我们证明这些方法是不稳定的。然而,我们证明作为多参数对象的度-Rips是稳定的,并提出了一种获取度-Rips切片的替代方法,该方法能生成具有更好稳定性的单参数层次聚类算法。我们使用对应交错距离证明该算法具有一致性。我们提出了一种从单参数层次聚类中提取单一聚类的算法,该算法关于对应交错距离是稳定的。最后,我们将这些方法整合到名为Persistable的密度聚类流程中。通过适配多参数持续同调工具,我们提出可视化工具来指导流程中所有参数的选择。我们在基准数据集上展示了Persistable的性能,表明它能识别数据中的多尺度聚类结构。