In many applications in data clustering, it is desirable to find not just a single partition into clusters but a sequence of partitions describing the data at different scales (or levels of coarseness). A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin multiscale descriptions of data. Here, we introduce the Multiscale Clustering Filtration (MCF), a well-defined and stable filtration of abstract simplicial complexes that encodes arbitrary patterns of cluster assignments across scales of increasing coarseness. We show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy in the sequence of partitions, and the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we also provide an equivalent construction via a nerve complex filtration, and we show that in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. We then use numerical experiments to illustrate how the MCF can serve to characterise multiscale clusterings of synthetic data from stochastic block models.
翻译:在数据聚类的众多应用中,我们不仅期望获得单一的分区聚类结果,更希望得到一系列描述数据在不同尺度(或粗糙度层级)下划分的序列。一个自然产生的问题便是:如何分析与比较支撑数据多尺度描述的(未必具有层次结构的)分区序列。本文引入多尺度聚类滤过(MCF),这是一种定义良好且稳定的抽象单纯复形滤过结构,能够编码在粗糙度递增的尺度上任意模式的簇分配规律。我们证明,MCF的零维持续性同调可用于度量分区序列的层次化程度,而其高维持续性同调则能追踪簇分配冲突在整个序列中出现与消解的过程。为拓展MCF的理论基础,我们还通过神经复形滤过给出了等价构造,并证明在层次化情形下,MCF可简化为超度量空间的Vietoris-Rips滤过。最后,我们通过数值实验展示了如何利用MCF来刻画基于随机块模型生成的合成数据的多尺度聚类特征。