Shapley Values (SV) are widely used in explainable AI, but their estimation and interpretation can be challenging, leading to inaccurate inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit the tree structure efficiently and are more accurate than state-of-the-art methods. Simulations and comparisons are performed with state-of-the-art algorithms and show the practical gain of our approach. Finally, we discuss the limitations of Shapley Values as a local explanation. These methods are available as a Python package.
翻译:沙普利值(Shapley Values, SV)在可解释人工智能中广泛应用,但其估计与解释存在挑战,可能导致不准确的推断与解释。作为起点,我们首先回顾沙普利值的不变性原理,并推导出对编码方式尤为敏感的分类变量的正确计算方法。针对树模型,我们提出了两种利用树结构且精度优于现有方法的沙普利值估计器。通过模拟实验与当前最优算法的比较,验证了本方法的实际优势。最后,我们讨论了沙普利值作为局部解释方法的局限性。相关方法已以Python包形式发布。