In many applications in data clustering, it is desirable to find not just a single partition into clusters but a sequence of partitions describing the data at different scales, or levels of coarseness. A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions of data. Here, we introduce a filtration of abstract simplicial complexes, denoted the Multiscale Clustering Filtration (MCF), which encodes arbitrary patterns of cluster assignments across scales, and we prove that the MCF produces stable persistence diagrams. We then show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy in the sequence of partitions, and that the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we also provide an equivalent construction via a nerve complex filtration, and we show that in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. We briefly illustrate how the MCF can serve to characterise multiscale clustering structures in numerical experiments on synthetic data.
翻译:在数据聚类的许多应用中,人们不仅希望找到单个划分,更希望获得一系列描述不同尺度或粗粒度级别的划分。因此,一个自然的问题是分析并比较支撑这种数据多尺度描述的(不一定是层次化的)划分序列。本文引入了一种抽象单纯复形的滤流,称为多尺度聚类滤流(MCF),它能够编码跨尺度的任意聚类分配模式,并证明了MCF产生稳定的持续图。随后我们证明,MCF的零维持续同调度量了划分序列的层次性程度,而高维持续同调则追踪了划分序列中聚类分配之间冲突的产生与消解。为拓展MCF的理论基础,我们还通过神经复形滤流提供了等价构造,并证明在层次化情形下MCF退化为超度量空间的Vietoris-Rips滤流。最后,我们通过合成数据的数值实验简要展示了MCF如何表征多尺度聚类结构。