MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation

Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decision-making processes. Traditional methods often rely on post-hoc interpretations, such as saliency maps or feature visualization, which might not be directly applicable to the discrete nature of word data in NLP. Addressing this, we introduce the Model-agnostic Saliency Estimation (MASE) framework. MASE offers local explanations for text-based predictive models without necessitating in-depth knowledge of a model's internal architecture. By leveraging Normalized Linear Gaussian Perturbations (NLGP) on the embedding layer instead of raw word inputs, MASE efficiently estimates input saliency. Our results indicate MASE's superiority over other model-agnostic interpretation methods, especially in terms of Delta Accuracy, positioning it as a promising tool for elucidating the operations of text-based models in NLP.

翻译：深度神经网络（DNNs）在自然语言处理（NLP）领域取得了显著进展，但其可解释性，特别是在评估其复杂决策过程方面，仍然难以捉摸。传统方法通常依赖于事后解释，如显著性图或特征可视化，这些方法可能无法直接适用于NLP中离散词数据的特性。针对这一问题，我们提出了模型无关显著性估计（MASE）框架。MASE为基于文本的预测模型提供局部解释，无需深入了解模型的内部架构。通过在嵌入层而非原始词输入上应用归一化线性高斯扰动（NLGP），MASE能够高效估计输入显著性。我们的结果表明，MASE在Delta Accuracy等指标上优于其他模型无关解释方法，使其成为阐明NLP中文本模型操作的有力工具。