Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through "Marginal Variable Importance Metric" (MVIM), a model-agnostic measure of predictor importance based on the true conditional expectation function. MVIM evaluates predictors' influence on continuous or discrete outcomes. A permutation-based estimation approach, inspired by \citet{breiman2001random} and \citet{fisher2019all}, is proposed to estimate MVIM. MVIM estimator is biased when predictors are highly correlated, as black-box models struggle to extrapolate in low-probability regions. To address this, we investigated the bias-variance decomposition of MVIM to understand the source and pattern of the bias under high correlation. A Conditional Variable Importance Metric (CVIM), adapted from \citet{strobl2008conditional}, is introduced to reduce this bias. Both MVIM and CVIM exhibit a quadratic relationship with the conditional average treatment effect (CATE).
翻译:解释黑盒机器学习模型具有挑战性,因其对数据的强依赖性及固有的非参数特性。本文通过"边际变量重要性度量"(MVIM)重新引入重要性的概念,这是一种基于真实条件期望函数的、与模型无关的预测变量重要性度量方法。MVIM可评估预测变量对连续或离散结果的影响。受\citet{breiman2001random}与\citet{fisher2019all}启发,我们提出基于置换的估计方法来计算MVIM。当预测变量高度相关时,由于黑盒模型在低概率区域的外推能力受限,MVIM估计量会产生偏差。为应对此问题,我们研究了MVIM的偏差-方差分解,以理解高相关性下偏差的来源与模式。通过改进\citet{strobl2008conditional}的方法,我们引入条件变量重要性度量(CVIM)来降低此类偏差。MVIM与CVIM均与条件平均处理效应(CATE)呈现二次关系。