The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding which of the internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the neural net with an oblique decision tree having sparse weight vectors at the decision nodes. Using the recently proposed Tree Alternating Optimization (TAO) algorithm, we are able to learn trees that are both highly accurate and interpretable. Such trees can faithfully mimic the part of the neural net they replaced, and hence they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. These insights and manipulations apply globally to the entire training and test set, not just at a local (single-instance) level. We demonstrate this robustly in the MNIST and ImageNet datasets with LeNet5 and VGG networks.
翻译:深度神经网络在实际应用中的广泛部署,促使人们越来越渴望理解这类黑盒方法进行预测的方式及原因。已有大量研究聚焦于理解输入模式(如图像)的哪部分导致了特定类别的预测,以及如何操纵输入以预测不同类别。我们则专注于理解神经网络计算的哪些内部特征导致了特定类别。为此,我们通过使用在决策节点处具有稀疏权重向量的斜决策树来模拟神经网络的部分功能。利用近期提出的树交替优化(TAO)算法,我们能够学习到既高度准确又可解释的决策树。此类树可以忠实地模拟其所替代的神经网络部分,从而为深度网络的黑盒特性提供深刻见解。此外,我们证明可以通过轻松操纵神经网络特征,使网络预测或不预测给定类别,从而表明在特征层面实施对抗攻击是可行的。这些见解和操纵可全局应用于整个训练集和测试集,而不仅限于局部(单一实例)层面。我们通过MNIST和ImageNet数据集上的LeNet5与VGG网络稳健地验证了这一点。