Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applicable to any machine learning model and requires virtually no extra computation or modeling effort. On several high-dimensional datasets, we find it can produce dramatic reductions in the Monte Carlo variability of Shapley estimates.
翻译:夏普利值是解释黑盒机器学习模型预测最常用的工具之一。然而,其高昂的计算成本促使人们采用采样近似方法,这带来了相当大的不确定性。为稳定这些模型解释,我们提出ControlSHAP方法,该方法基于蒙特卡洛控制变量技术。我们的方法论适用于任何机器学习模型,且几乎不需要额外的计算或建模工作。在多个高维数据集上,我们发现该方法能显著降低夏普利值估计的蒙特卡洛变异性。