The expansion of explainable artificial intelligence as a field of research has generated numerous methods of visualizing and understanding the black box of a machine learning model. Attribution maps are generally used to highlight the parts of the input image that influence the model to make a specific decision. On the other hand, the robustness of machine learning models to natural noise and adversarial attacks is also being actively explored. This paper focuses on evaluating methods of attribution mapping to find whether robust neural networks are more explainable. We explore this problem within the application of classification for medical imaging. Explainability research is at an impasse. There are many methods of attribution mapping, but no current consensus on how to evaluate them and determine the ones that are the best. Our experiments on multiple datasets (natural and medical imaging) and various attribution methods reveal that two popular evaluation metrics, Deletion and Insertion, have inherent limitations and yield contradictory results. We propose a new explainability faithfulness metric (called EvalAttAI) that addresses the limitations of prior metrics. Using our novel evaluation, we found that Bayesian deep neural networks using the Variational Density Propagation technique were consistently more explainable when used with the best performing attribution method, the Vanilla Gradient. However, in general, various types of robust neural networks may not be more explainable, despite these models producing more visually plausible attribution maps.
翻译:可解释人工智能作为研究领域的扩展,已催生出众多可视化并理解机器学习模型黑箱的方法。归因图通常用于突出输入图像中影响模型做出特定决策的部分。与此同时,机器学习模型对自然噪声和对抗攻击的鲁棒性也在被积极探究。本文聚焦于评估归因映射方法,以探究鲁棒神经网络是否更具可解释性。我们将此问题置于医学影像分类的应用场景中探索。可解释性研究目前陷入僵局:虽然存在多种归因映射方法,但尚无评估这些方法并确定最优者的共识。我们在多个数据集(自然图像与医学影像)及多种归因方法上的实验表明,两种流行的评估指标——删除(Deletion)与插入(Insertion)——存在固有局限性,且产生矛盾的结果。我们提出一种新的可解释性保真度指标(称为EvalAttAI),该指标克服了先前指标的局限。利用我们提出的新评估方法,发现采用变分密度传播技术的贝叶斯深度神经网络在与表现最佳的归因方法(Vanilla Gradient)配合使用时,始终更具可解释性。然而总体而言,尽管各类鲁棒神经网络能生成视觉上更逼真的归因图,但未必具有更强的可解释性。