A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would provide insights into the question of what a deep learning system has internally detected as relevant on the input, de-mystifying the otherwise black-box character of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. Our approach is based on using large-scale background knowledge approximately 2 million classes curated from the Wikipedia concept hierarchy together with a symbolic reasoning approach called Concept Induction based on description logics, originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process
翻译:可解释人工智能的一大挑战在于正确解读隐藏神经元的激活状态:准确的解读能够揭示深度学习系统在输入中内部检测到的相关要素,从而祛除深度学习系统固有的黑箱特性。现有研究表明,隐藏节点的激活在某些情况下能以符合人类理解的方式被解释,但能够假设并验证隐藏神经元激活解释的系统性自动化方法仍鲜有探索。本文提出了一种此类方法,并证明其能提供有意义的解释。该方法基于大规模背景知识(约200万类目,源自维基百科概念层级)与一种名为"概念归纳"的符号推理方法(基于描述逻辑,最初为语义网领域应用而开发)。实验结果表明,通过假设与验证流程,我们能够自动为卷积神经网络密集层的单个神经元附加来自背景知识的有意义标签。