The field of "explainable artificial intelligence" (XAI) seemingly addresses the desire that decisions of machine learning systems should be human-understandable. However, in its current state, XAI itself needs scrutiny. Popular methods cannot reliably answer relevant questions about ML models, their training data, or test inputs, because they systematically attribute importance to input features that are independent of the prediction target. This limits the utility of XAI for diagnosing and correcting data and models, for scientific discovery, and for identifying intervention targets. The fundamental reason for this is that current XAI methods do not address well-defined problems and are not evaluated against targeted criteria of explanation correctness. Researchers should formally define the problems they intend to solve and design methods accordingly. This will lead to diverse use-case-dependent notions of explanation correctness and objective metrics of explanation performance that can be used to validate XAI algorithms.
翻译:"可解释人工智能"领域表面上致力于满足机器学习系统决策应具有人类可理解性的期望。然而,在当前状态下,XAI领域本身就需要严格审视。流行方法无法可靠地回答关于机器学习模型、其训练数据或测试输入的相关问题,因为这些方法系统性地将重要性归因于与预测目标无关的输入特征。这限制了XAI在诊断和修正数据与模型、科学发现以及识别干预目标方面的实用性。其根本原因在于当前XAI方法没有针对明确定义的问题进行处理,也未能依据解释正确性的特定标准进行评估。研究者应当正式定义其意图解决的问题,并据此设计方法。这将催生依赖于具体用例的多样化解释正确性概念,以及可用于验证XAI算法的客观解释性能指标。