Hierarchical clustering is a fundamental task in data analysis, but classical methods have long lacked a principled objective function. Dasgupta [STOC 2016] took an important step toward addressing this gap by proposing a well-motivated objective function for cluster trees. Cohen-Addad et al. [J. ACM 2019] subsequently introduced the notion of admissibility: an objective function is admissible if, whenever the input similarity matrix admits generating trees, its minimizers are precisely those generating trees. They also gave a necessary and sufficient condition for admissibility within a family of objective functions based on aggregate intercluster similarity. We refer to this family as sum-type objective functions. However, apart from Dasgupta's original objective function, no explicit admissible objective functions in this family were provided. In this paper, we study admissible objective functions for hierarchical clustering in two directions. For sum-type objective functions, we give a complete characterization when the scaling function is a symmetric polynomial of degree at most two, and we derive sufficient conditions for degree-three polynomials. We also show that the recursive sparsest cut algorithm achieves an O$(φ)$-approximation ratio for the admissible objective functions covered by our characterization, where $φ$ is the approximation factor of the sparsest cut subroutine. We then introduce max-type objective functions, where cluster interaction is measured by maximum, rather than aggregate, intercluster similarity. For this class, we characterize which objective functions are admissible for arbitrary symmetric scaling functions and give a complete characterization when the scaling function is a symmetric polynomial of degree at most two.
翻译:层次聚类是数据分析中的一项基本任务,但经典方法长期缺乏有原则的目标函数。Dasgupta [STOC 2016] 通过为聚类树提出一个动机明确的目标函数,向解决这一差距迈出了重要一步。Cohen-Addad 等人 [J. ACM 2019] 随后引入了可容许性的概念:如果当输入相似度矩阵允许生成树时,目标函数的最小值恰好是这些生成树,则该目标函数是可容许的。他们还基于聚簇间聚合相似度,在一个目标函数族内给出了可容许性的充要条件。我们将该族称为求和型目标函数。然而,除了 Dasgupta 的原始目标函数外,该族中并未提供任何显式的可容许目标函数。本文从两个方向研究层次聚类的可容许目标函数。对于求和型目标函数,当缩放函数是次数不超过二的对称多项式时,我们给出完整刻画,并推导出三次多项式的充分条件。我们还证明,递归最稀疏割算法对于我们的刻画所涵盖的可容许目标函数,可实现 O$(φ)$ 近似比,其中 $φ$ 是最稀疏割子程序的近似因子。随后,我们引入最大值型目标函数,其中聚簇间的相互作用通过最大(而非聚合)聚簇间相似度衡量。对于此类函数,我们刻画了当缩放函数为任意对称函数时哪些目标函数是可容许的,并在缩放函数为次数不超过二的对称多项式时给出完整刻画。