We consider a global representation of a regression or classification function by decomposing it into the sum of main and interaction components of arbitrary order. We propose a new identification constraint that allows for the extraction of interventional SHAP values and partial dependence plots, thereby unifying local and global explanations. With our proposed identification, a feature's partial dependence plot corresponds to the main effect term plus the intercept. The interventional SHAP value of feature $k$ is a weighted sum of the main component and all interaction components that include $k$, with the weights given by the reciprocal of the component's dimension. This brings a new perspective to local explanations such as SHAP values which were previously motivated by game theory only. We show that the decomposition can be used to reduce direct and indirect bias by removing all components that include a protected feature. Lastly, we motivate a new measure of feature importance. In principle, our proposed functional decomposition can be applied to any machine learning model, but exact calculation is only feasible for low-dimensional structures or ensembles of those. We provide an algorithm and efficient implementation for gradient-boosted trees (xgboost) and random planted forest. Conducted experiments suggest that our method provides meaningful explanations and reveals interactions of higher orders. The proposed methods are implemented in an R package, available at \url{https://github.com/PlantedML/glex}.
翻译:我们考虑通过将回归或分类函数分解为任意阶主效应与交互效应之和,来构建其全局表示。我们提出一种新的识别约束,能够提取干预性SHAP值与部分依赖图,从而统一局部与全局解释。基于所提出的识别方法,特征的部分依赖图对应主效应项加上截距。特征$k$的干预性SHAP值是其主成分与所有包含$k$的交互成分的加权和,权重由成分维度的倒数给出。这为SHAP值等此前仅由博弈论驱动的局部解释提供了新视角。我们证明该分解可通过移除包含受保护特征的所有成分来减少直接与间接偏差。最后,我们提出一种新的特征重要性度量。原则上,我们提出的函数分解可应用于任何机器学习模型,但精确计算仅适用于低维结构或其集成模型。我们为梯度提升树(xgboost)和随机种植森林提供了算法与高效实现。实验表明,我们的方法能提供有意义的解释,并揭示高阶交互作用。所提出的方法已在R包中实现,可从\url{https://github.com/PlantedML/glex}获取。