We present AuToMATo, a novel parameter-free clustering algorithm based on persistent homology. AuToMATo combines the existing ToMATo clustering algorithm with a bootstrapping procedure in order to separate significant peaks of an estimated density function from non-significant ones. We perform a thorough comparison of AuToMATo against many other state-of-the-art clustering algorithms. We find that not only that AuToMATo compares favorably against other parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms. AuToMATo is motivated by applications in topological data analysis, in particular the Mapper algorithm, where it is desirable to work with a parameter-free clustering algorithm. Indeed, we provide evidence that AuToMATo performs well when used with Mapper. Finally, we provide an open-source implementation of AuToMATo in Python that is fully compatible with the standardscikit-learn architecture.
翻译:本文提出AuToMATo,一种基于持久同调的新型无参数聚类算法。AuToMATo将现有的ToMATo聚类算法与自助法过程相结合,以区分估计密度函数的显著峰值与非显著峰值。我们对AuToMATo与多种其他先进聚类算法进行了全面比较。结果发现,AuToMATo不仅优于其他无参数聚类算法,而且在许多情况下甚至显著优于其他算法经过最优参数选择后的表现。AuToMATo的提出受到拓扑数据分析应用的驱动,特别是在Mapper算法中,使用无参数聚类算法具有明显优势。我们通过实验证明AuToMATo与Mapper结合使用时表现优异。最后,我们提供了AuToMATo的Python开源实现,该实现完全兼容标准scikit-learn架构。