Interpretability of Deep Learning (DL) is a barrier to trustworthy AI. Despite great efforts made by the Explainable AI (XAI) community, explanations lack robustness -- indistinguishable input perturbations may lead to different XAI results. Thus, it is vital to assess how robust DL interpretability is, given an XAI method. In this paper, we identify several challenges that the state-of-the-art is unable to cope with collectively: i) existing metrics are not comprehensive; ii) XAI techniques are highly heterogeneous; iii) misinterpretations are normally rare events. To tackle these challenges, we introduce two black-box evaluation methods, concerning the worst-case interpretation discrepancy and a probabilistic notion of how robust in general, respectively. Genetic Algorithm (GA) with bespoke fitness function is used to solve constrained optimisation for efficient worst-case evaluation. Subset Simulation (SS), dedicated to estimate rare event probabilities, is used for evaluating overall robustness. Experiments show that the accuracy, sensitivity, and efficiency of our methods outperform the state-of-the-arts. Finally, we demonstrate two applications of our methods: ranking robust XAI methods and selecting training schemes to improve both classification and interpretation robustness.
翻译:深度学习可解释性是可信人工智能的障碍。尽管可解释人工智能领域付出了巨大努力,但解释缺乏鲁棒性——难以区分的输入扰动可能导致不同的可解释人工智能结果。因此,在给定可解释人工智能方法时,评估深度学习可解释性的鲁棒性至关重要。本文识别出现有方法无法共同应对的若干挑战:i)现有评估指标不全面;ii)可解释人工智能技术高度异构;iii)错误解释通常属于稀有事件。为应对这些挑战,我们提出两种黑盒评估方法:分别针对最坏情况下的解释偏差,以及关于整体鲁棒性的概率度量。采用定制适应度函数的遗传算法求解约束优化问题以实现高效的最坏情况评估;而专门用于估计稀有事件概率的子集模拟法则用于评估整体鲁棒性。实验表明,我们方法的准确性、灵敏度和效率均优于现有最优方法。最后,我们展示了这两种方法的两项应用:对鲁棒可解释人工智能方法排序,以及选择训练方案以同时提升分类与解释鲁棒性。