The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
翻译:“可解释”人工智能(XAI)领域已产生许多高被引方法,旨在通过为输入特征赋予“重要性”分数等方式,使复杂机器学习(ML)方法的决策对人类“可理解”。然而,由于缺乏形式化基础,目前尚不清楚能从给定XAI方法的结果中安全得出何种结论,这也阻碍了XAI方法的理论验证与实证评估。这意味着当前通过深度神经网络解决的具有挑战性的非线性问题缺乏适当的应对方案。为此,我们针对三种不同的非线性分类场景构建了基准数据集,其中重要的类别条件特征通过设计已知,可作为真实解释。利用新颖的定量指标,我们跨三种深度学习模型架构对多种XAI方法的解释性能进行了基准测试。结果表明,流行的XAI方法往往无法显著优于随机性能基线和边缘检测方法。此外,我们证明源自不同模型架构的解释可能差异巨大,因此在受控条件下也容易产生误解。