Explainable AI (xAI) methods are important for establishing trust in using black-box models. However, recent criticism has mounted against current xAI methods that they disagree, are necessarily false, and can be manipulated, which has started to undermine the deployment of black-box models. Rudin (2019) goes so far as to say that we should stop using black-box models altogether in high-stakes cases because xAI explanations "must be wrong". However, strict fidelity to the truth is historically not a desideratum in science. Idealizations -- the intentional distortions introduced to scientific theories and models -- are commonplace in the natural sciences and are seen as a successful scientific tool. Thus, it is not falsehood qua falsehood that is the issue. In this paper, I outline the need for xAI research to engage in idealization evaluation. Drawing on the use of idealizations in the natural sciences and philosophy of science, I introduce a novel framework for evaluating whether xAI methods engage in successful idealizations or deceptive explanations (SIDEs). SIDEs evaluates whether the limitations of xAI methods, and the distortions that they introduce, can be part of a successful idealization or are indeed deceptive distortions as critics suggest. I discuss the role that existing research can play in idealization evaluation and where innovation is necessary. Through a qualitative analysis we find that leading feature importance methods and counterfactual explanations are subject to idealization failure and suggest remedies for ameliorating idealization failure.
翻译:可解释人工智能(xAI)方法对于建立对黑箱模型的信任至关重要。然而,近期针对当前xAI方法的批评日益增多,认为它们存在分歧、必然错误且可被操纵,这已开始削弱黑箱模型的部署应用。Rudin(2019)甚至主张,在高风险场景中应彻底停止使用黑箱模型,因为xAI的解释"必然是错的"。但严格忠于真理在科学史上并非必要属性。理想化——即对科学理论与模型的刻意扭曲——在自然科学中普遍存在,并被视作成功的科学工具。因此,问题并非在于"错误"本身。本文主张xAI研究亟需开展理想化评估。借鉴自然科学与科学哲学中理想化的应用,我提出一个评估xAI方法属于成功理想化还是欺骗性解释的新框架(SIDEs)。SIDEs通过评估xAI方法的局限性及其引入的扭曲,判定这些扭曲能否成为成功理想化的组成部分,抑或如批评者所言实为欺骗性扭曲。本文讨论现有研究在理想化评估中的作用及需要创新的领域。通过定性分析发现,主流特征重要性方法与反事实解释存在理想化失效问题,并提出改善理想化失效的补救措施。