Feature attribution methods are widely used for explaining image-based predictions, as they provide feature-level insights that can be intuitively visualized. However, such explanations often vary in their robustness and may fail to faithfully reflect the reasoning of the underlying black-box model. To address these limitations, we propose a novel conformal prediction-based approach that enables users to directly control the fidelity of the generated explanations. The method identifies a subset of salient features that is sufficient to preserve the model's prediction, regardless of the information carried by the excluded features, and without demanding access to ground-truth explanations for calibration. Four conformity functions are proposed to quantify the extent to which explanations conform to the model's predictions. The approach is empirically evaluated using five explainers across six image datasets. The empirical results demonstrate that FastSHAP consistently outperforms the competing methods in terms of both fidelity and informational efficiency, the latter measured by the size of the explanation regions. Furthermore, the results reveal that conformity measures based on super-pixels are more effective than their pixel-wise counterparts.
翻译:特征归因方法被广泛用于解释基于图像的预测,因为它们提供了可在直观上可视化的特征级洞察。然而,此类解释的稳健性往往存在差异,且可能无法忠实反映底层黑盒模型的推理过程。为解决这些局限性,我们提出了一种新颖的基于保形预测的方法,使用户能够直接控制生成解释的保真度。该方法识别出一个显著特征子集,该子集足以保留模型的预测,无论被排除特征所携带的信息如何,且无需依赖真实解释进行校准。我们提出了四种一致性函数来量化解释与模型预测的符合程度。该方法使用五种解释器在六个图像数据集上进行了实证评估。实证结果表明,FastSHAP 在保真度和信息效率(后者通过解释区域的大小衡量)方面均持续优于竞争方法。此外,结果揭示基于超像素的一致性度量比基于像素的对应度量更为有效。