In this paper, we consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared. When applied to decision tree-based regression models, the metric can be computed accurately and efficiently for arbitrary independent feature perturbation distributions. In particular, the computation does not involve Monte Carlo sampling that has been typically used for computing similar metrics and which is inherently prone to inaccuracies. Moreover, we propose a method of ranking features by their importance for the tree model's predictions based on PGI squared. Our experiments indicate that in some respects, the method may identify the globally important features better than the state-of-the-art SHAP explainer
翻译:本文考虑了一种基于扰动的特征排名(或归因)预测可信度度量,称为PGI平方。当应用于基于决策树的回归模型时,该度量可针对任意独立的特征扰动分布实现精确且高效的计算。特别地,该计算无需采用通常用于计算类似度量的蒙特卡洛采样,后者本质上易产生不准确性。此外,我们提出了一种基于PGI平方的树模型预测特征重要性排序方法。实验表明,在某些方面,该方法可能比当前最先进的SHAP解释器能更好地识别全局重要特征。