A multitude of explainability methods and associated fidelity performance metrics have been proposed to help better understand how modern AI systems make decisions. However, much of the current work has remained theoretical -- without much consideration for the human end-user. In particular, it is not yet known (1) how useful current explainability methods are in practice for more real-world scenarios and (2) how well associated performance metrics accurately predict how much knowledge individual explanations contribute to a human end-user trying to understand the inner-workings of the system. To fill this gap, we conducted psychophysics experiments at scale to evaluate the ability of human participants to leverage representative attribution methods for understanding the behavior of different image classifiers representing three real-world scenarios: identifying bias in an AI system, characterizing the visual strategy it uses for tasks that are too difficult for an untrained non-expert human observer as well as understanding its failure cases. Our results demonstrate that the degree to which individual attribution methods help human participants better understand an AI system varied widely across these scenarios. This suggests a critical need for the field to move past quantitative improvements of current attribution methods towards the development of complementary approaches that provide qualitatively different sources of information to human end-users.
翻译:大量可解释性方法及其相关的忠实性性能指标已被提出,旨在帮助更好地理解现代人工智能系统如何做出决策。然而,目前大部分工作仍停留在理论层面——缺乏对最终人类用户的充分考量。具体而言,尚不清楚(1)当前可解释性方法在更真实场景中的实际效用如何,以及(2)相关性能指标在预测单个解释能为试图理解系统内部机制的人类用户贡献多少知识方面准确度如何。为填补这一空白,我们开展了大规模心理物理学实验,评估人类参与者利用代表性归因方法理解不同图像分类器行为的能力,这些分类器对应三个真实场景:识别人工智能系统中的偏差、描述其对非专业人类观察者而言过于困难的任务所采用的视觉策略,以及理解其失败案例。研究结果表明,不同归因方法帮助人类参与者更好理解人工智能系统的程度在这些场景中存在显著差异。这揭示了一个关键需求:该领域需要超越当前归因方法的量化改进,转向开发能为人类用户提供质性不同信息来源的互补性方法。