Originating from cooperative game theory, Shapley values have become one of the most widely used measures for variable importance in applied Machine Learning. However, the statistical understanding of Shapley values is still limited. In this paper, we take a nonparametric (or smoothing) perspective by introducing Shapley curves as a local measure of variable importance. We consider two estimation strategies and derive the consistency and asymptotic normality both under independence and dependence among the features. We further propose a novel version of the wild bootstrap procedure specifically adjusted for Shapley curves. This allows us to construct confidence intervals and conduct inference. The asymptotic results are validated in extensive experiments. In an empirical application, we analyze which attributes drive the prices of vehicles.
翻译:源自合作博弈论的Shapley值已成为应用机器学习中衡量变量重要性的最广泛使用的指标之一。然而,关于Shapley值的统计学理解仍然有限。本文从非参数化(即平滑)视角出发,引入Shapley曲线作为变量重要性的局部度量。我们考虑了两种估计策略,并在特征独立与相依情形下推导出估计量的一致性与渐近正态性。进一步,我们提出了一种专门针对Shapley曲线调整的新型野马自举法,从而能够构建置信区间并进行统计推断。通过大量实验验证了渐近结果的有效性。在实证应用中,我们分析了哪些属性驱动车辆价格的变化。