A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function $f$. The insertion and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of $f$. We find an expression for the expected value of the AUC under a random ordering of inputs to $f$ and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input$\times$gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression.
翻译:可解释人工智能(XAI)的一项基础任务是识别黑盒函数$f$预测背后最重要的特征。Petsiuk等人(2018)提出的插入与删除检验可用于评估对像素按分类重要性从高到低排序的算法质量。受回归问题启发,我们建立了其曲线下面积(AUC)准则的公式,该公式基于$f$的锚定分解中的特定主效应和交互效应。我们推导了在输入$f$的随机排序下AUC期望值的表达式,并提出了一种适用于回归场景的替代直线以上面积准则。利用该准则,我们比较了积分梯度(IG)、核SHAP(KS)、LIME、DeepLIFT、原始梯度及输入×梯度方法计算的特征重要性。在两个数据集上,KS总体表现最佳,但计算成本极高。我们发现IG的效果接近KS且速度更快。我们的比较问题中包含一些对IG构成挑战的二元输入,因其必须使用变量水平之间的值,因此我们探讨了IG中二元变量的处理方法。研究表明,按Shapley值排序变量不一定能获得插入-删除检验的最优排序,但对于加性模型的单调函数(如逻辑回归),该方法有效。