Shapley values have emerged as a central game-theoretic tool in explainable AI (XAI). However, computing Shapley values exactly requires $2^d$ game evaluations for a model with $d$ features. Lundberg and Lee's KernelSHAP algorithm has emerged as a leading method for avoiding this exponential cost. KernelSHAP approximates Shapley values by approximating the game as a linear function, which is fit using a small number of game evaluations for random feature subsets. In this work, we extend KernelSHAP by approximating the game via higher degree polynomials, which capture non-linear interactions between features. Our resulting PolySHAP method yields empirically better Shapley value estimates for various benchmark datasets, and we prove that these estimates are consistent. Moreover, we connect our approach to paired sampling (antithetic sampling), a ubiquitous modification to KernelSHAP that improves empirical accuracy. We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial. To the best of our knowledge, this finding provides the first strong theoretical justification for the excellent practical performance of the paired sampling heuristic.
翻译:Shapley值已成为可解释人工智能(XAI)中核心的博弈论工具。然而,精确计算Shapley值需要对具有$d$个特征的模型进行$2^d$次博弈评估。Lundberg与Lee提出的KernelSHAP算法已成为避免这一指数计算成本的主流方法。KernelSHAP通过将博弈近似为线性函数来逼近Shapley值,该线性函数使用随机特征子集的少量博弈评估进行拟合。本研究通过采用更高阶的多项式(能够捕捉特征间的非线性交互作用)来近似博弈,从而扩展了KernelSHAP方法。我们提出的PolySHAP方法在多个基准数据集上获得了经验上更优的Shapley值估计结果,并证明了这些估计的一致性。此外,我们将该方法与配对采样(对偶采样)这一普遍用于提升KernelSHAP经验准确性的改进技术建立联系。我们证明配对采样输出的Shapley值近似结果与二阶PolySHAP完全相同,且无需拟合任何二阶多项式。据我们所知,这一发现首次为配对采样启发式方法优异的实际性能提供了坚实的理论依据。