Multiple machine learning models can achieve near-equivalent predictive performance on the same task, yet provide divergent feature-based explanations. This is called the Rashomon effect of (explainable) machine learning, and it raises the question of which explanations, if any, are trustworthy. We propose a framework based on metamorphic testing that assesses explanation faithfulness without requiring ground-truth labels by exploring attributed feature importance from post-hoc explanation methods. Five metamorphic relations formalize expected consistency properties between model behavior and feature attributions. We apply this general framework to two tabular regression datasets and two post-hoc explainers (SHAP and LIME) to demonstrate the approach. The framework offers a practical, model-agnostic tool for selecting accurate models with reliable and trustworthy explanations.
翻译:多个机器学习模型在同一任务上可能达到近乎等效的预测性能,却产生基于特征的相异解释。这一现象被称为(可解释)机器学习的Rashomon效应,引发了关于哪些解释(若存在)值得信赖的疑问。我们提出一种基于蜕变测试的框架,通过探索事后解释方法中归因的特征重要性,无需真实标签即可评估解释的忠实性。五项蜕变关系形式化地规定了模型行为与特征归因之间应具备的一致性属性。我们将该通用框架应用于两个表格回归数据集和两种事后解释方法(SHAP与LIME)以验证其有效性。该框架为选择具有可靠且可信解释的准确模型提供了一种实用的、与模型无关的工具。