As black-box machine learning models grow in complexity and find applications in high-stakes scenarios, it is imperative to provide explanations for their predictions. Although Local Interpretable Model-agnostic Explanations (LIME) [22] is a widely adpoted method for understanding model behaviors, it is unstable with respect to random seeds [35,24,3] and exhibits low local fidelity (i.e., how well the explanation approximates the model's local behaviors) [21,16]. Our study shows that this instability problem stems from small sample weights, leading to the dominance of regularization and slow convergence. Additionally, LIME's sampling neighborhood is non-local and biased towards the reference, resulting in poor local fidelity and sensitivity to reference choice. To tackle these challenges, we introduce GLIME, an enhanced framework extending LIME and unifying several prior methods. Within the GLIME framework, we derive an equivalent formulation of LIME that achieves significantly faster convergence and improved stability. By employing a local and unbiased sampling distribution, GLIME generates explanations with higher local fidelity compared to LIME. GLIME explanations are independent of reference choice. Moreover, GLIME offers users the flexibility to choose a sampling distribution based on their specific scenarios.
翻译:随着黑箱机器学习模型日益复杂并在高风险场景中应用,为其预测提供解释变得至关重要。尽管局部可解释模型无关解释(LIME)[22]是理解模型行为的广泛采用方法,但其对随机种子[35,24,3]不稳定,且局部保真度较低(即解释近似模型局部行为的程度)[21,16]。我们的研究表明,这一不稳定性问题源于小样本权重,导致正则化主导和收敛缓慢。此外,LIME的采样邻域是非局部且偏向参考点的,导致局部保真度差和对参考点选择的敏感性。为解决这些挑战,我们提出GLIME,一个增强框架,扩展了LIME并统一了若干先前方法。在GLIME框架内,我们推导出LIME的等价形式,实现了显著更快的收敛和更高的稳定性。通过采用局部且无偏的采样分布,GLIME生成的解释相比LIME具有更高局部保真度。GLIME解释独立于参考点选择。此外,GLIME允许用户根据具体场景灵活选择采样分布。