Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable k-means and k-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the k-means and k-median cost. Formally, we show that there exists a data set X in the Euclidean plane, for which there is a decision tree of depth k-1 whose k-means/k-median cost matches the optimal clustering cost of X, but every decision tree of depth less than k-1 has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the k-center objective as well, albeit with weaker guarantees.
翻译:过去几年中,可解释聚类引起了广泛关注。Dasgupta等人[ICML'20]首次研究了可解释k-means和k-median聚类问题,其中解释通过阈值决策树实现,该决策树在每个节点使用轴平行超平面划分空间。近期,Laber等人[Pattern Recognition'23]提出将决策树深度作为附加复杂度度量指标。本文证明,即使输入点位于欧几里得平面内,任何解释性深度约简都会导致k-means和k-median代价的无界损失。形式上,我们证明存在欧几里得平面中的数据集X,存在深度为k-1的决策树,其k-means/k-median代价与X的最优聚类代价相匹配,但所有深度小于k-1的决策树相对于最优聚类代价均具有无界代价。我们进一步将结果扩展至k-center目标函数,但保证条件较弱。