Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems, often defined as determining which features are most important to a model's prediction. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to avoid or minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single explainer. This is a particularly concerning trend when considering that recent work has identified systematic disagreement in explainability methods when applied to the same points and underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose to shift from post-hoc explainability to designing interpretable neural network architectures; moving away from approximation techniques in human-centric and high impact applications. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing for interpretable conditional computation and diagnostic benchmarks for iterative model learning). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.
翻译:可解释人工智能(XAI)在帮助人类理解和信任深度学习系统方面发挥着关键作用,通常被定义为确定对模型预测最重要的特征。随着模型规模日益庞大、在日常生活中的应用更加普及和广泛,可解释性对于避免或减轻模型错误的负面影响至关重要。不幸的是,当前以人为中心的XAI方法(例如医疗、教育或个性化广告中的预测任务)往往依赖于单一解释器。这一趋势尤其令人担忧,因为近期研究已发现,当针对相同数据点和底层黑箱模型应用可解释性方法时,这些方法之间表现出系统性分歧。因此,本文提出行动倡议,旨在解决当前最先进解释器的局限性。我们建议从事后可解释性转向设计可解释的神经网络架构,从而在以人为中心和高影响力应用中摒弃近似技术。我们识别出以人为中心的XAI的五项需求(实时性、准确性、可操作性、人类可解释性和一致性),并提出两种可解释性设计的神经网络工作流方案(基于自适应路由的可解释条件计算和用于迭代模型学习的诊断基准)。我们假设,以人为中心的XAI的未来既不在于解释黑箱,也不在于回归传统可解释模型,而在于本质上具有可解释性的神经网络。