Shapley values have emerged as a widely accepted and trustworthy tool, grounded in theoretical axioms, for addressing challenges posed by black-box models like deep neural networks. However, computing Shapley values encounters exponential complexity in the number of features. Various approaches, including ApproSemivalue, KernelSHAP, and FastSHAP, have been explored to expedite the computation. We analyze the consistency of existing works and conclude that stochastic estimators can be unified as the linear transformation of importance sampling of feature subsets. Based on this, we investigate the possibility of designing simple amortized estimators and propose a straightforward and efficient one, SimSHAP, by eliminating redundant techniques. Extensive experiments conducted on tabular and image datasets validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
翻译:沙普利值已成为一种基于理论公理、广泛接受且值得信赖的工具,用于应对深度神经网络等黑盒模型带来的挑战。然而,计算沙普利值面临特征数量呈指数级复杂度的难题。已有多种方法,包括ApproSemivalue、KernelSHAP和FastSHAP,被探索用于加速计算。我们分析了现有工作的一致性,并得出结论:随机估计器可统一为对特征子集进行重要性采样的线性变换。在此基础上,我们研究了设计简单摊销估计器的可能性,并提出了一种直接高效的估计器SimSHAP,通过消除冗余技术实现。在表格和图像数据集上进行的广泛实验验证了我们SimSHAP的有效性,它显著加速了准确沙普利值的计算。