This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta's cost function. For any input graph $G$ with a clear cluster-structure, our designed algorithms run in nearly-linear time in the input size of $G$, and return an $O(1)$-approximate HC tree with respect to Dasgupta's cost function. We compare the performance of our algorithm against the previous state-of-the-art on synthetic and real-world datasets and show that our designed algorithm produces comparable or better HC trees with much lower running time.
翻译:本文针对Dasgupta代价函数提出了两种高效的层次聚类算法。对于任意具有清晰聚类结构的输入图$G$,我们设计的算法在$G$的输入规模上几乎线性时间内运行,并返回一个相对于Dasgupta代价函数具有$O(1)$近似比的层次聚类树。我们在合成数据集和真实数据集上将算法性能与现有最优方法进行对比,结果表明我们设计的算法能以更低的运行时间生成相当或更优的层次聚类树。