The accelerated progress of artificial intelligence (AI) has popularized deep learning models across domains, yet their inherent opacity poses challenges, notably in critical fields like healthcare, medicine and the geosciences. Explainable AI (XAI) has emerged to shed light on these "black box" models, helping decipher their decision making process. Nevertheless, different XAI methods yield highly different explanations. This inter-method variability increases uncertainty and lowers trust in deep networks' predictions. In this study, for the first time, we propose a novel framework designed to enhance the explainability of deep networks, by maximizing both the accuracy and the comprehensibility of the explanations. Our framework integrates various explanations from established XAI methods and employs a non-linear "explanation optimizer" to construct a unique and optimal explanation. Through experiments on multi-class and binary classification tasks in 2D object and 3D neuroscience imaging, we validate the efficacy of our approach. Our explanation optimizer achieved superior faithfulness scores, averaging 155% and 63% higher than the best performing XAI method in the 3D and 2D applications, respectively. Additionally, our approach yielded lower complexity, increasing comprehensibility. Our results suggest that optimal explanations based on specific criteria are derivable and address the issue of inter-method variability in the current XAI literature.
翻译:人工智能的加速发展已使深度学习模型在各领域普及,但其固有的不透明性带来了挑战,尤其是在医疗、医学和地球科学等关键领域。可解释人工智能(XAI)应运而生,旨在阐明这些“黑箱”模型,帮助解读其决策过程。然而,不同的XAI方法会生成高度不同的解释。这种方法间的变异性增加了不确定性,并降低了对深度网络预测的信任。在本研究中,我们首次提出一种新颖框架,旨在通过最大化解释的准确性和可理解性来增强深度网络的可解释性。我们的框架整合了来自现有XAI方法的各种解释,并采用非线性的“解释优化器”构建独特且最优的解释。通过在2D物体和3D神经科学成像领域的多类与二分类任务上开展实验,我们验证了该方法的有效性。我们的解释优化器在忠诚度得分上表现优异,在3D和2D应用中分别比表现最佳的XAI方法平均高出155%和63%。此外,我们的方法降低了复杂度,提高了可理解性。结果表明,基于特定标准的最优解释是可推导的,并能解决当前XAI文献中的方法间变异性问题。