We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.
翻译:我们考虑最小二乘回归模型的性能,以样本外$R^2$作为评判标准。沙普利值能够公平地将模型性能归因于其输入特征,同时考虑特征间的相互依赖关系。精确计算沙普利值需要求解回归问题的数量随特征数量呈指数增长,因此通常采用蒙特卡洛类近似方法。我们专注于最小二乘回归模型这一特殊情况,其中可利用多种技巧高效计算和评估回归模型。这些技巧显著提升了计算速度,使得能够评估更多蒙特卡洛样本,从而获得更好的准确性。我们将该方法称为最小二乘沙普利性能归因(LS-SPA),并介绍了我们的开源实现。