We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.
翻译:我们考虑最小二乘回归模型的性能,以样本外$R^2$为评判标准。Shapley值将模型性能公平归因于其输入特征,同时考虑特征间的相互依赖关系。精确计算Shapley值需要求解数量与特征数量呈指数关系的回归问题,因此通常采用蒙特卡洛型近似方法。我们聚焦于最小二乘回归模型的特殊情况,利用多种技巧高效计算和评估回归模型。这些技巧能显著加速计算,使更多蒙特卡洛样本得以评估,从而实现更高精度。我们将所提方法称为最小二乘Shapley性能归因(LS-SPA),并描述其开源实现。