The surge in black-box AI models has prompted the need to explain the internal mechanism and justify their reliability, especially in high-stakes applications, such as healthcare and autonomous driving. Due to the lack of a rigorous definition of explainable AI (XAI), a plethora of research related to explainability, interpretability, and transparency has been developed to explain and analyze the model from various perspectives. Consequently, with an exhaustive list of papers, it becomes challenging to have a comprehensive overview of XAI research from all aspects. Considering the popularity of neural networks in AI research, we narrow our focus to a specific area of XAI research: gradient based explanations, which can be directly adopted for neural network models. In this review, we systematically explore gradient based explanation methods to date and introduce a novel taxonomy to categorize them into four distinct classes. Then, we present the essence of technique details in chronological order and underscore the evolution of algorithms. Next, we introduce both human and quantitative evaluations to measure algorithm performance. More importantly, we demonstrate the general challenges in XAI and specific challenges in gradient based explanations. We hope that this survey can help researchers understand state-of-the-art progress and their corresponding disadvantages, which could spark their interest in addressing these issues in future work.
翻译:黑箱人工智能模型的激增促使我们需要解释其内部机制并证明其可靠性,尤其是在医疗健康和自动驾驶等高风险应用中。由于可解释人工智能(XAI)缺乏严格定义,大量从可解释性、可理解性和透明度角度出发的研究被开发出来,以从不同视角解释和分析模型。因此,面对数量庞大的论文,全面概览XAI所有方面的研究变得颇具挑战性。考虑到神经网络在AI研究中的流行性,我们将聚焦于XAI研究的一个特定领域:可直接应用于神经网络模型的基于梯度的解释方法。在本综述中,我们系统梳理了迄今为止的基于梯度的解释方法,并提出一种新颖的分类法,将其划分为四类。随后,我们按时间顺序呈现了技术细节的核心要点,并强调了算法的演进历程。接着,我们介绍了评估算法性能的人工评测和定量评估方法。更重要的是,我们论证了XAI领域的普遍挑战以及基于梯度的解释方法所面临的特定挑战。我们期望本综述能帮助研究人员了解最新进展及其相应缺陷,从而激发他们在未来工作中解决这些问题的兴趣。