Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces. One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters. We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.
翻译:连接方法是层次聚类中最常用的算法之一。尽管这些方法应用广泛,但目前对其产生聚类质量的认知仍有限。本文改进了度量空间中全连接聚类最大直径的现有上界。与现有界相比,我们提出的一个新上界能够从直径近似角度区分全连接与单连接方法,这印证了当以生成紧凑聚类为目标时前者比后者更合适的普遍认知。我们还证明,所提出的技术可用于推导包含广泛使用的平均连接在内的某类连接方法的内聚性上界。