Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is generally unknown, it needs to be assigned subjectively or be estimated from data, which may lead to misleading feature scores. In this paper, we propose a principled framework for reasoning on SHAP scores under unknown entity population distributions. In our framework, we consider an uncertainty region that contains the potential distributions, and the SHAP score of a feature becomes a function defined over this region. We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features. In particular, we pinpoint the complexity of these problems, and other related ones, showing them to be NP-complete. Finally, we present experiments on a real-world dataset, showing that our framework may contribute to a more robust feature scoring.
翻译:归因得分反映了机器学习模型输出对输入实体中特征值的依赖程度。最流行的归因得分之一是SHAP得分,它是一般联盟博弈论中沙普利值的具体实例。该得分的定义依赖于实体总体上的概率分布。由于真实分布通常未知,需要主观分配或从数据中估计,这可能导致误导性的特征得分。本文提出了一种在未知实体总体分布下推理SHAP得分的规范化框架。在该框架中,我们考虑包含潜在分布的不确定性区域,特征SHAP得分成为定义在该区域上的函数。我们研究了该函数最大值与最小值的基本问题,从而能够确定所有特征SHAP得分的严格范围。特别地,我们精确刻画了这些问题及其他相关问题的复杂度,证明它们为NP完全问题。最后,我们在真实数据集上进行了实验,表明该框架有助于实现更稳健的特征得分。