Deep neural networks, while powerful for image classification, often operate as "black boxes," complicating the understanding of their decision-making processes. Various explanation methods, particularly those generating saliency maps, aim to address this challenge. However, the inconsistency issues of faithfulness metrics hinder reliable benchmarking of explanation methods. This paper employs an approach inspired by psychometrics, utilizing Krippendorf's alpha to quantify the benchmark reliability of post-hoc methods in image classification. The study proposes model training modifications, including feeding perturbed samples and employing focal loss, to enhance robustness and calibration. Empirical evaluations demonstrate significant improvements in benchmark reliability across metrics, datasets, and post-hoc methods. This pioneering work establishes a foundation for more reliable evaluation practices in the realm of post-hoc explanation methods, emphasizing the importance of model robustness in the assessment process.
翻译:深度神经网络在图像分类中虽性能强大,却常以"黑箱"形式运作,导致其决策过程难以理解。各类解释方法——尤其生成显著性图的方法——旨在应对这一挑战。然而,保真度度量的一致性缺陷阻碍了解释方法的可靠基准测试。本文借鉴心理测量学思路,采用Krippendorff's α系数量化图像分类事后方法的基准可靠性。研究提出模型训练改进策略,包括输入扰动样本及采用焦点损失函数,以增强鲁棒性与校准性。实验评估表明,该方法在多个度量标准、数据集和事后方法上显著提升了基准可靠性。这项开创性工作为事后解释方法的更可靠评估实践奠定基础,强调了模型鲁棒性在评估过程中的重要性。