Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable $k$-means and $k$-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the $k$-means and $k$-median cost. Formally, we show that there exists a data set $X\subseteq \mathbb{R}^2$, for which there is a decision tree of depth $k-1$ whose $k$-means/$k$-median cost matches the optimal clustering cost of $X$, but every decision tree of depth less than $k-1$ has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the $k$-center objective as well, albeit with weaker guarantees.
翻译:近年来,可解释聚类受到了广泛关注。Dasgupta等人[ICML'20]开创了可解释$k$-均值和$k$-中位数聚类问题的研究,其中解释通过阈值决策树捕获,该树在每个节点使用轴平行超平面划分空间。最近,Laber等人[Pattern Recognition'23]提出将决策树的深度作为另一个值得关注的复杂度度量。在本工作中,我们证明即使输入点位于欧几里得平面,解释中的任何深度约简都会导致$k$-均值和$k$-中位数成本的无界损失。形式化地,我们证明存在数据集$X\subseteq \mathbb{R}^2$,对于该数据集存在深度为$k-1$的决策树,其$k$-均值/$k$-中位数成本与$X$的最优聚类成本相匹配,但任何深度小于$k-1$的决策树相对于最优聚类成本具有无界成本。我们还将结果扩展到$k$-中心目标,尽管保证较弱。