Gradient-based explanation methods play an important role in the field of interpreting complex deep neural networks for NLP models. However, the existing work has shown that the gradients of a model are unstable and easily manipulable, which impacts the model's reliability largely. According to our preliminary analyses, we also find the interpretability of gradient-based methods is limited for complex tasks, such as aspect-based sentiment classification (ABSC). In this paper, we propose an \textbf{I}nterpretation-\textbf{E}nhanced \textbf{G}radient-based framework for \textbf{A}BSC via a small number of explanation annotations, namely \texttt{{IEGA}}. Particularly, we first calculate the word-level saliency map based on gradients to measure the importance of the words in the sentence towards the given aspect. Then, we design a gradient correction module to enhance the model's attention on the correct parts (e.g., opinion words). Our model is model agnostic and task agnostic so that it can be integrated into the existing ABSC methods or other tasks. Comprehensive experimental results on four benchmark datasets show that our \texttt{IEGA} can improve not only the interpretability of the model but also the performance and robustness.
翻译:基于梯度的解释方法在解释自然语言处理模型中复杂深度神经网络方面发挥着重要作用。然而,现有研究表明,模型的梯度具有不稳定性且易受操纵,这极大地影响了模型的可靠性。根据我们的初步分析,我们同样发现基于梯度的解释方法在复杂任务(如基于方面的情感分类)中的可解释性有限。本文提出了一种基于少量解释标注的、面向方面情感分类的可解释性增强梯度框架,即IEGA。具体而言,我们首先基于梯度计算词级显著性图,用以衡量句子中各词汇对给定方面的重要性。随后,我们设计了一个梯度校正模块,以增强模型对正确部分(如观点词)的关注。我们的模型具有模型无关性和任务无关性,因此可集成至现有基于方面的情感分类方法或其他任务中。在四个基准数据集上的综合实验结果表明,我们的IEGA不仅能提升模型的可解释性,还能改善其性能和鲁棒性。