We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.
翻译:我们考虑最小二乘回归模型的性能,以样本外$R^2$作为评判标准。沙普利值能够公平地将模型性能归因于其输入特征,同时考虑到特征之间的相互依赖关系。精确计算沙普利值需要求解数量随特征数指数级增长的回归问题,因此通常采用蒙特卡洛类型的近似方法。我们专注于最小二乘回归模型这一特殊情况,其中可以利用多种技巧来高效计算和评估回归模型。这些技巧显著提升了计算速度,允许评估更多的蒙特卡洛样本,从而获得更好的精度。我们将该方法称为最小二乘沙普利性能归因(LS-SPA),并介绍了我们的开源实现。