Shapley values have emerged as a central game-theoretic tool in explainable AI (XAI). However, computing Shapley values exactly requires $2^d$ game evaluations for a model with $d$ features. Lundberg and Lee's KernelSHAP algorithm has emerged as a leading method for avoiding this exponential cost. KernelSHAP approximates Shapley values by approximating the game as a linear function, which is fit using a small number of game evaluations for random feature subsets. In this work, we extend KernelSHAP by approximating the game via higher degree polynomials, which capture non-linear interactions between features. Our resulting PolySHAP method yields empirically better Shapley value estimates for various benchmark datasets, and we prove that these estimates are consistent. Moreover, we connect our approach to paired sampling (antithetic sampling), a ubiquitous modification to KernelSHAP that improves empirical accuracy. We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial. To the best of our knowledge, this finding provides the first strong theoretical justification for the excellent practical performance of the paired sampling heuristic.
翻译:沙普利值已成为可解释人工智能(XAI)中核心的博弈论工具。然而,精确计算沙普利值需要对具有 $d$ 个特征的模型进行 $2^d$ 次博弈评估。Lundberg 和 Lee 提出的 KernelSHAP 算法已成为避免这一指数成本的主流方法。KernelSHAP 通过将博弈近似为线性函数来逼近沙普利值,该线性函数使用对随机特征子集的少量博弈评估进行拟合。在本研究中,我们通过使用更高阶的多项式(能够捕捉特征间的非线性交互作用)来近似博弈,从而扩展了 KernelSHAP。我们提出的 PolySHAP 方法在多个基准数据集上获得了经验上更优的沙普利值估计,并证明了这些估计的一致性。此外,我们将我们的方法与配对抽样(对偶抽样)联系起来,后者是 KernelSHAP 中一种普遍采用的改进方法,旨在提升经验准确性。我们证明,配对抽样输出的沙普利值近似结果与二阶 PolySHAP 完全相同,且无需拟合任何二阶多项式。据我们所知,这一发现首次为配对抽样启发式方法的优异实际性能提供了坚实的理论依据。