As opaque black-box predictive models become more prevalent, the need to develop interpretations for these models is of great interest. The concept of variable importance and Shapley values are interpretability measures that applies to any predictive model and assesses how much a variable or set of variables improves prediction performance. When the number of variables is large, estimating variable importance presents a significant computational challenge because re-training neural networks or other black-box algorithms requires significant additional computation. In this paper, we address this challenge for algorithms using gradient descent and gradient boosting (e.g. neural networks, gradient-boosted decision trees). By using the ideas of early stopping of gradient-based methods in combination with warm-start using the dropout method, we develop a scalable method to estimate variable importance for any algorithm that can be expressed as an iterative kernel update equation. Importantly, we provide theoretical guarantees by using the theory for early stopping of kernel-based methods for neural networks with sufficiently large (but not necessarily infinite) width and gradient-boosting decision trees that use symmetric trees as a weaker learner. We also demonstrate the efficacy of our methods through simulations and a real data example which illustrates the computational benefit of early stopping rather than fully re-training the model as well as the increased accuracy of our approach.
翻译:随着不透明的黑盒预测模型日益普及,开发针对这些模型的解释方法变得至关重要。变量重要性与Shapley值作为可解释性度量指标,适用于任何预测模型,用于评估单个变量或变量集对预测性能的提升程度。当变量数量庞大时,估计变量重要性面临显著的计算挑战,因为重新训练神经网络或其他黑盒算法需要大量额外计算。本文针对使用梯度下降与梯度提升的算法(如神经网络、梯度提升决策树)解决了这一挑战。通过将基于梯度方法的早停策略与基于丢弃法的热启动技术相结合,我们开发了一种可扩展的方法,用于估计任何可表达为迭代核更新方程的算法的变量重要性。重要的是,我们通过运用核方法早停理论为以下模型提供了理论保证:具有足够大(不一定无限)宽度的神经网络,以及使用对称树作为弱学习器的梯度提升决策树。我们通过仿真实验和真实数据案例验证了方法的有效性,展示了早停策略相较于完全重新训练模型的计算优势,以及本方法在精度上的提升。