Informative Perturbation Selection for Uncertainty-Aware Post-hoc Explanations

Trust and ethical concerns due to the widespread deployment of opaque machine learning (ML) models motivating the need for reliable model explanations. Post-hoc model-agnostic explanation methods addresses this challenge by learning a surrogate model that approximates the behavior of the deployed black-box ML model in the locality of a sample of interest. In post-hoc scenarios, neither the underlying model parameters nor the training are available, and hence, this local neighborhood must be constructed by generating perturbed inputs in the neighborhood of the sample of interest, and its corresponding model predictions. We propose \emph{Expected Active Gain for Local Explanations} (\texttt{EAGLE}), a post-hoc model-agnostic explanation framework that formulates perturbation selection as an information-theoretic active learning problem. By adaptively sampling perturbations that maximize the expected information gain, \texttt{EAGLE} efficiently learns a linear surrogate explainable model while producing feature importance scores along with the uncertainty/confidence estimates. Theoretically, we establish that cumulative information gain scales as $\mathcal{O}(d \log t)$, where $d$ is the feature dimension and $t$ represents the number of samples, and that the sample complexity grows linearly with $d$ and logarithmically with the confidence parameter $1/δ$. Empirical results on tabular and image datasets corroborate our theoretical findings and demonstrate that \texttt{EAGLE} improves explanation reproducibility across runs, achieves higher neighborhood stability, and improves perturbation sample quality as compared to state-of-the-art baselines such as Tilia, US-LIME, GLIME and BayesLIME.

翻译：由于不透明机器学习模型的广泛部署所引发的信任与伦理问题，推动了对可靠模型解释的需求。事后模型无关解释方法通过学习一个替代模型来应对这一挑战，该模型在感兴趣样本的局部区域内近似已部署黑盒机器学习模型的行为。在事后场景中，既无法获取底层模型参数，也无法获得训练数据，因此必须通过在感兴趣样本的邻域内生成扰动输入及其对应的模型预测来构建此局部邻域。我们提出了 \emph{面向局部解释的期望主动增益} (\texttt{EAGLE})，这是一个事后模型无关的解释框架，它将扰动选择表述为一个信息论主动学习问题。通过自适应地采样能最大化期望信息增益的扰动，\texttt{EAGLE} 高效地学习一个线性替代可解释模型，同时生成特征重要性分数以及不确定性/置信度估计。理论上，我们证明了累积信息增益的规模为 $\mathcal{O}(d \log t)$，其中 $d$ 是特征维度，$t$ 代表样本数量，并且样本复杂度随 $d$ 线性增长，随置信参数 $1/δ$ 对数增长。在表格和图像数据集上的实证结果证实了我们的理论发现，并表明与 Tilia、US-LIME、GLIME 和 BayesLIME 等最先进的基线方法相比，\texttt{EAGLE} 提高了不同运行间解释的可复现性，获得了更高的邻域稳定性，并改善了扰动样本的质量。