Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations.However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks, learned with the dual loss of an optimal transportation problem, exhibit desirable XAI properties:They are highly concentrated on the essential parts of the image with low noise, significantly outperforming state-of-the-art explanation approaches across various models and metrics. We also prove that these maps align unprecedentedly well with human explanations on ImageNet.To explain the particularly beneficial properties of the Saliency Map for such models, we prove this gradient encodes both the direction of the transportation plan and the direction towards the nearest adversarial attack. Following the gradient down to the decision boundary is no longer considered an adversarial attack, but rather a counterfactual explanation that explicitly transports the input from one class to another. Thus, Learning with such a loss jointly optimizes the classification objective and the alignment of the gradient, i.e. the Saliency Map, to the transportation plan direction.These networks were previously known to be certifiably robust by design, and we demonstrate that they scale well for large problems and models, and are tailored for explainability using a fast and straightforward method.
翻译:输入梯度在多种应用中扮演着关键角色,包括用于评估模型鲁棒性的对抗攻击算法、生成显著性图的可解释人工智能技术以及反事实解释。然而,传统神经网络生成的显著性图通常充满噪声且提供的见解有限。本文证明,相反地,通过最优传输问题的对偶损失学习得到的1-利普希茨神经网络的显著性图展现出理想的可解释人工智能特性:它们高度集中于图像的关键部分且噪声较低,在各种模型和指标上显著优于最先进的解释方法。我们还证明这些图在ImageNet上与人类解释达到了前所未有的高度一致。为解释此类模型显著性图特别有利的性质,我们证明该梯度同时编码了传输计划的方向和最近对抗攻击的方向。沿梯度下降到决策边界不再被视为对抗攻击,而是作为显式地将输入从一类传输到另一类的反事实解释。因此,使用此类损失进行学习可联合优化分类目标以及梯度(即显著性图)与传输计划方向的对齐。这类网络此前已知本身具有可证明的鲁棒性,我们证明它们在大规模问题和模型上具有良好的可扩展性,并通过快速简单的方法专门适用于可解释性。