We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.
翻译:考虑基于样本外$R^2$指标评估的最小二乘回归模型性能。Shapley值可公平地将模型性能归因于各输入特征,并捕捉特征间的相互依赖关系。精确计算Shapley值需要求解与特征数量呈指数增长的回归问题,因此通常采用蒙特卡洛型近似方法。我们聚焦于最小二乘回归模型的特殊情形,其中可利用多种技巧高效计算和评估回归模型。这些技巧可显著加速计算,使得评估更多蒙特卡洛样本成为可能,从而获得更高精度。我们将所提方法称为最小二乘Shapley性能归因(LS-SPA),并描述其开源实现方案。