Counterfactuals have been recognized as an effective approach to explain classifier decisions. Nevertheless, they have not yet been considered in the context of clustering. In this work, we propose the use of counterfactuals to explain clustering solutions. First, we present a general definition for counterfactuals for model-based clustering that includes plausibility and feasibility constraints. Then we consider the counterfactual generation problem for k-means and Gaussian clustering assuming Euclidean distance. Our approach takes as input the factual, the target cluster, a binary mask indicating actionable or immutable features and a plausibility factor specifying how far from the cluster boundary the counterfactual should be placed. In the k-means clustering case, analytical mathematical formulas are presented for computing the optimal solution, while in the Gaussian clustering case (assuming full, diagonal, or spherical covariances) our method requires the numerical solution of a nonlinear equation with a single parameter only. We demonstrate the advantages of our approach through illustrative examples and quantitative experimental comparisons.
翻译:反事实已被公认为解释分类器决策的有效方法。然而,在聚类分析领域,该方法尚未得到充分探讨。本研究提出利用反事实解释聚类结果。首先,我们为基于模型的聚类提出了一种包含合理性与可行性约束的通用反事实定义。随后,针对采用欧氏距离的k-means聚类与高斯聚类,我们探讨了反事实生成问题。该方法以事实样本、目标簇、指示可操作特征与不可变特征的二元掩码,以及指定反事实应置于簇边界外多远的合理性因子作为输入。在k-means聚类场景中,我们给出了计算最优解的解析数学公式;而在高斯聚类场景中(假设协方差矩阵为完整、对角或球形结构),我们的方法仅需通过单参数非线性方程的数值解即可实现。通过示例演示与定量实验对比,我们验证了所提方法的优势。