Interpretability of Deep Learning (DL) is a barrier to trustworthy AI. Despite great efforts made by the Explainable AI (XAI) community, explanations lack robustness -- indistinguishable input perturbations may lead to different XAI results. Thus, it is vital to assess how robust DL interpretability is, given an XAI method. In this paper, we identify several challenges that the state-of-the-art is unable to cope with collectively: i) existing metrics are not comprehensive; ii) XAI techniques are highly heterogeneous; iii) misinterpretations are normally rare events. To tackle these challenges, we introduce two black-box evaluation methods, concerning the worst-case interpretation discrepancy and a probabilistic notion of how robust in general, respectively. Genetic Algorithm (GA) with bespoke fitness function is used to solve constrained optimisation for efficient worst-case evaluation. Subset Simulation (SS), dedicated to estimate rare event probabilities, is used for evaluating overall robustness. Experiments show that the accuracy, sensitivity, and efficiency of our methods outperform the state-of-the-arts. Finally, we demonstrate two applications of our methods: ranking robust XAI methods and selecting training schemes to improve both classification and interpretation robustness.
翻译:深度学习可解释性是实现可信人工智能的关键障碍。尽管可解释人工智能领域已付出巨大努力,但解释结果缺乏鲁棒性——不可区分的输入扰动可能导致不同的可解释人工智能结果。因此,针对给定的可解释人工智能方法,评估深度学习可解释性的鲁棒性至关重要。本文指出现有技术无法共同应对的多项挑战:i) 现有评估指标不全面;ii) 可解释人工智能技术高度异质;iii) 解释偏差通常属于罕见事件。为应对这些挑战,我们提出两种黑盒评估方法,分别关注最坏情形下的解释差异和整体鲁棒性的概率度量。采用定制适应度函数的遗传算法解决约束优化问题以实现高效的最坏情形评估,使用适用于估计罕见事件概率的子集模拟法评估整体鲁棒性。实验表明,所提方法在准确性、敏感性和效率方面均优于现有技术。最后,我们展示两种方法的应用场景:对可解释人工智能方法进行鲁棒性排序,以及选择训练方案以同时提升分类与解释鲁棒性。