Hierarchical clustering is a fundamental task in data analysis, but classical methods have long lacked a principled objective function. Dasgupta [STOC~2016] took an important step toward addressing this gap by proposing a well-motivated objective function for cluster trees. Cohen-Addad et al. [J. ACM 2019] subsequently introduced the notion of admissibility: an objective function is admissible if, whenever the input similarity matrix admits generating trees, its minimizers are precisely those generating trees.They also gave a necessary and sufficient condition for admissibility within a family of objective functions based on aggregate intercluster similarity. We refer to this family as sum-type objective functions. However, apart from Dasgupta's original objective function, no explicit admissible objective functions in this family were provided. In this paper, we study admissible objective functions for hierarchical clustering in two directions. For sum-type objective functions, we give a complete characterization when the scaling function is a symmetric polynomial of degree at most two, and we derive sufficient conditions for degree-three polynomials. We also show that the recursive sparsest cut algorithm achieves an O$(φ)$-approximation ratio for the admissible objective functions covered by our characterization, where $φ$ is the approximation factor of the sparsest cut subroutine. We then introduce max-type objective functions, where cluster interaction is measured by maximum, rather than aggregate, intercluster similarity. For this class, we characterize which objective functions are admissible for arbitrary symmetric scaling functions and give a complete characterization when the scaling function is a symmetric polynomial of degree at most two.
翻译:层次聚类是数据分析中的基础任务,但经典方法长期缺乏有原则的目标函数。Dasgupta [STOC~2016] 通过提出一种动机充分的目标函数,向解决这一差距迈出了重要一步。Cohen-Addad 等人 [J. ACM 2019] 随后引入了可容许性概念:若每当输入相似度矩阵允许生成树时,目标函数的极小值点恰好是这些生成树,则该目标函数是可容许的。他们还在基于簇间聚合相似度的一类目标函数中,给出了可容许性的充要条件。我们将此类目标函数称为和型目标函数。然而,除 Dasgupta 原始目标函数外,该族中未提供其他显式可容许目标函数。本文从两个方向研究层次聚类的可容许目标函数。对于和型目标函数,当尺度函数为次数不超过二的对称多项式时,我们给出了完整刻画;对于三次多项式,我们推导了充分条件。我们还证明了递归最稀疏割算法对于我们的刻画所涵盖的可容许目标函数能达到 O$(φ)$ 的近似比,其中 $φ$ 为最稀疏割子例程的近似因子。随后,我们引入了最大型目标函数,其中簇间相互作用通过簇间最大相似度而非聚合相似度来度量。对于此类函数,我们刻画了任意对称尺度函数下哪些目标函数是可容许的,并在尺度函数为次数不超过二的对称多项式时给出了完整刻画。