In this technical note, we aim to explain a deep neural network (DNN) by quantifying the encoded interactions between input variables, which reflects the DNN's inference logic. Specifically, we first rethink the definition of interactions, and then formally define faithfulness and conciseness for interaction-based explanation. To this end, we propose two kinds of interactions, i.e., the AND interaction and the OR interaction. For faithfulness, we prove the uniqueness of the AND (OR) interaction in quantifying the effect of the AND (OR) relationship between input variables. Besides, based on AND-OR interactions, we design techniques to boost the conciseness of the explanation, while not hurting the faithfulness. In this way, the inference logic of a DNN can be faithfully and concisely explained by a set of symbolic concepts.
翻译:在本技术说明中,我们旨在通过量化输入变量之间编码的交互作用来解释深度神经网络(DNN),这反映了DNN的推理逻辑。具体而言,我们首先重新思考交互的定义,然后形式化地定义基于交互的解释的忠实性和简洁性。为此,我们提出了两种交互,即AND交互和OR交互。对于忠实性,我们证明了AND(OR)交互在量化输入变量间AND(OR)关系效应时的唯一性。此外,基于AND-OR交互,我们设计了提升解释简洁性的技术,同时不损害忠实性。通过这种方式,DNN的推理逻辑可以借助一组符号概念得到忠实且简洁的解释。