Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be implemented with any distance choice. The persistence of outliers and clusters of arbitrary shape is inferred from the resulting hierarchy. We demonstrate the potential of the algorithm on selected datasets in which outliers play relevant roles, consisting of images, medical and economic data. These methods can provide meaningful clusters in situations in which other techniques fail to do so.
翻译:拓扑方法具备探索数据云结构而无需对其形态做出先验假设的潜力。本文提出一种可适配任意距离度量的层次化拓扑聚类算法。通过分析生成的层次结构,可推断出异常点与任意形状簇的持续性特征。我们在包含图像、医疗及经济数据的若干选定数据集上验证了该算法的潜力,这些数据集中异常点均发挥着关键作用。实验表明,在传统聚类技术失效的场景下,本方法仍能生成具有实际意义的聚类结果。