Post-hoc explanations are widely used to justify, contest, and review automated decisions in high-stakes domains such as lending, employment, and healthcare. Among these methods, SHAP is often treated as providing a reliable account of which features mattered for an individual prediction and is routinely used to support recourse, oversight, and accountability. In practice, however, SHAP explanations can differ substantially across repeated runs, even when the individual, prediction task, and trained model are held fixed. We conceptualize and name this phenomenon explanation multiplicity: the existence of multiple, internally valid but substantively different explanations for the same decision. Explanation multiplicity poses a normative challenge for responsible AI deployment, as it undermines expectations that explanations can reliably identify the reasons for an adverse outcome. We present a comprehensive methodology for characterizing explanation multiplicity in post-hoc feature attribution methods, disentangling sources arising from model training and selection versus stochasticity intrinsic to the explanation pipeline. Furthermore, whether explanation multiplicity is surfaced depends on how explanation consistency is measured. Commonly used magnitude-based metrics can suggest stability while masking substantial instability in the identity and ordering of top-ranked features. To contextualize observed instability, we derive and estimate randomized baseline values under plausible null models, providing a principled reference point for interpreting explanation disagreement. Across datasets, model classes, and confidence regimes, we find that explanation multiplicity is widespread and persists even under highly controlled conditions, including high-confidence predictions. Thus explanation practices must be evaluated using metrics and baselines aligned with their intended societal role.
翻译:事后解释方法被广泛用于在高风险领域(如贷款、就业和医疗保健)中为自动化决策提供辩护、质疑和审查。在这些方法中,SHAP通常被视为能够可靠地说明哪些特征对个体预测产生了影响,并常规性地用于支持追索、监督和问责。然而在实践中,即使个体、预测任务和训练模型保持不变,SHAP解释在多次运行中也可能存在显著差异。我们将这种现象概念化并命名为"解释多重性":即对同一决策存在多个内部有效但实质上不同的解释。解释多重性对负责任的人工智能部署构成了规范性挑战,因为它破坏了"解释能够可靠识别不利结果原因"的预期。我们提出了一套全面的方法论,用于特征化事后特征归因方法中的解释多重性,区分源于模型训练与选择的因素和解释流程内在随机性的来源。此外,解释多重性是否显现取决于解释一致性的度量方式。常用的基于幅度的度量指标可能暗示稳定性,却掩盖了排名靠前特征的身份与顺序的重大不稳定性。为了对观测到的不稳定性进行情境化分析,我们在合理的零模型下推导并估计了随机化基线值,为解释分歧提供了原则性的参考基准。通过对多个数据集、模型类别和置信度区间的实验,我们发现解释多重性普遍存在,即使在高度受控条件下(包括高置信度预测)仍然持续出现。因此,必须使用与其预期社会角色相一致的度量标准和基线来评估解释实践。