Explainable artificial intelligence (xAI) has gained significant attention in recent years. Among other things, explainablility for deep neural networks has been a topic of intensive research due to the meteoric rise in prominence of deep neural networks and their "black-box" nature. xAI approaches can be characterized along different dimensions such as their scope (global versus local explanations) or underlying methodologies (statistic-based versus rule-based strategies). Methods generating global explanations aim to provide reasoning process applicable to all possible output classes while local explanation methods focus only on a single, specific class. SHAP (SHapley Additive exPlanations), a well-known statistical technique, identifies important features of a network. Deep neural network rule extraction method constructs IF-THEN rules that link input conditions to a class. Another approach focuses on generating counterfactuals which help explain how small changes to an input can affect the model's predictions. However, these techniques primarily focus on the input-output relationship and thus neglect the structure of the network in explanation generation. In this work, we propose xDNN(ASP), an explanation generation system for deep neural networks that provides global explanations. Given a neural network model and its training data, xDNN(ASP) extracts a logic program under answer set semantics that-in the ideal case-represents the trained model, i.e., answer sets of the extracted program correspond one-to-one to input-output pairs of the network. We demonstrate experimentally, using two synthetic datasets, that not only the extracted logic program maintains a high-level of accuracy in the prediction task, but it also provides valuable information for the understanding of the model such as the importance of features as well as the impact of hidden nodes on the prediction. The latter can be used as a guide for reducing the number of nodes used in hidden layers, i.e., providing a means for optimizing the network.
翻译:可解释人工智能近年来受到广泛关注。其中,深度神经网络因其迅速崛起的重要性和其"黑盒"特性,其可解释性已成为密集研究的课题。可解释人工智能方法可从不同维度进行刻画,例如其范围(全局解释与局部解释)或基础方法论(基于统计的策略与基于规则的策略)。生成全局解释的方法旨在提供适用于所有可能输出类别的推理过程,而局部解释方法仅关注单个特定类别。SHAP(SHapley加性解释)作为一种著名的统计技术,可识别网络的重要特征。深度神经网络规则提取方法构建连接输入条件与类别的IF-THEN规则。另一种方法专注于生成反事实解释,以说明输入的微小变化如何影响模型预测。然而,这些技术主要关注输入输出关系,因此在解释生成中忽略了网络结构。本研究提出xDNN(ASP)——一种为深度神经网络提供全局解释的生成系统。给定神经网络模型及其训练数据,xDNN(ASP)在回答集语义下提取逻辑程序,该程序在理想情况下可表征训练后的模型,即提取程序的回答集与网络的输入输出对呈一一对应关系。我们通过两个合成数据集的实验证明,提取的逻辑程序不仅在预测任务中保持高精度,还为模型理解提供了有价值的信息,例如特征重要性以及隐藏节点对预测的影响。后者可作为减少隐藏层节点数量的指导,即为网络优化提供途径。