In natural language processing (NLP), deep neural networks (DNNs) could model complex interactions between context and have achieved impressive results on a range of NLP tasks. Prior works on feature interaction attribution mainly focus on studying symmetric interaction that only explains the additional influence of a set of words in combination, which fails to capture asymmetric influence that contributes to model prediction. In this work, we propose an asymmetric feature interaction attribution explanation model that aims to explore asymmetric higher-order feature interactions in the inference of deep neural NLP models. By representing our explanation with an directed interaction graph, we experimentally demonstrate interpretability of the graph to discover asymmetric feature interactions. Experimental results on two sentiment classification datasets show the superiority of our model against the state-of-the-art feature interaction attribution methods in identifying influential features for model predictions. Our code is available at https://github.com/StillLu/ASIV.
翻译:在自然语言处理(NLP)中,深度神经网络(DNN)能够对上下文之间的复杂交互进行建模,并在一系列NLP任务中取得了显著成果。以往关于特征交互归因的研究主要集中于对称交互,仅解释一组词汇组合后的额外影响,而无法捕捉到对模型预测有贡献的非对称影响。本文提出一种非对称特征交互归因解释模型,旨在探索深度神经NLP模型推理过程中的非对称高阶特征交互。通过用有向交互图表示我们的解释,我们实验证明了该图在发现非对称特征交互方面的可解释性。在两个情感分类数据集上的实验结果表明,在识别影响模型预测的关键特征方面,我们的模型优于当前最先进的特征交互归因方法。我们的代码可在 https://github.com/StillLu/ASIV 获取。