Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations

from arxiv, 34 pages, 6 figure. The following results are added to the version 2: W[1]-hardness of Counterfactual Explanation for l_2-distance when k is a parameter, NP-hardness of minimal sufficient reason for l_1-distance for k \ge 3, and Sigma_2-hardness of the minimum sufficient reason for Hamming distance and k \ge 3

Despite the wide use of $k$-Nearest Neighbors as classification models, their explainability properties remain poorly understood from a theoretical perspective. While nearest neighbors classifiers offer interpretability from a ``data perspective'', in which the classification of an input vector $\bar{x}$ is explained by identifying the vectors $\bar{v}_1, \ldots, \bar{v}_k$ in the training set that determine the classification of $\bar{x}$, we argue that such explanations can be impractical in high-dimensional applications, where each vector has hundreds or thousands of features and it is not clear what their relative importance is. Hence, we focus on understanding nearest neighbor classifications through a ``feature perspective'', in which the goal is to identify how the values of the features in $\bar{x}$ affect its classification. Concretely, we study abductive explanations such as ``minimum sufficient reasons'', which correspond to sets of features in $\bar{x}$ that are enough to guarantee its classification, and counterfactual explanations based on the minimum distance feature changes one would have to perform in $\bar{x}$ to change its classification. We present a detailed landscape of positive and negative complexity results for counterfactual and abductive explanations, distinguishing between discrete and continuous feature spaces, and considering the impact of the choice of distance function involved. Finally, we show that despite some negative complexity results, Integer Quadratic Programming and SAT solving allow for computing explanations in practice.

翻译：尽管$k$-最近邻分类器作为分类模型被广泛使用，但其可解释性在理论层面仍未得到充分理解。虽然最近邻分类器提供了从"数据视角"的可解释性——即通过识别训练集中决定输入向量$\bar{x}$分类的向量$\bar{v}_1, \ldots, \bar{v}_k$来解释其分类结果，我们认为这种解释方法在高维应用中可能不实用，因为每个向量具有数百甚至数千个特征，且其特征相对重要性不明确。因此，我们致力于通过"特征视角"来理解最近邻分类机制，其核心目标是识别$\bar{x}$中特征值如何影响其分类结果。具体而言，我们研究诸如"最小充分理由"这类溯因解释——即$\bar{x}$中能够保证其分类结果的充分特征集合，以及基于最小距离特征修改的反事实解释——即需要改变$\bar{x}$中哪些特征值才能改变其分类。我们系统建立了反事实与溯因解释的复杂度结果全景图，区分了离散与连续特征空间，并考虑了所选距离函数的影响。最后，我们证明尽管存在某些负面的复杂度结论，但整数二次规划与SAT求解技术在实际中仍能有效计算这些解释。