Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applicable to any machine learning model and requires virtually no extra computation or modeling effort. On several high-dimensional datasets, we find it can produce dramatic reductions in the Monte Carlo variability of Shapley estimates.
翻译:沙普利值是解释黑盒机器学习模型预测的最常用工具之一。然而,其高昂的计算成本促使人们采用采样近似,这带来了相当程度的不确定性。为稳定这些模型解释结果,我们提出 ControlSHAP 方法,该技术基于蒙特卡洛方法中的控制变量技术。本方法适用于任何机器学习模型,且几乎无需额外计算或建模工作。在多个高维数据集上,我们发现该方法能显著降低沙普利估计值的蒙特卡洛变异性。