ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models

Feature attribution methods (FAs), such as gradients and attention, are widely employed approaches to derive the importance of all input features to the model predictions. Existing work in natural language processing has mostly focused on developing and testing FAs for encoder-only language models (LMs) in classification tasks. However, it is unknown if it is faithful to use these FAs for decoder-only models on text generation, due to the inherent differences between model architectures and task settings respectively. Moreover, previous work has demonstrated that there is no `one-wins-all' FA across models and tasks. This makes the selection of a FA computationally expensive for large LMs since input importance derivation often requires multiple forward and backward passes including gradient computations that might be prohibitive even with access to large compute. To address these issues, we present a model-agnostic FA for generative LMs called Recursive Attribution Generator (ReAGent). Our method updates the token importance distribution in a recursive manner. For each update, we compute the difference in the probability distribution over the vocabulary for predicting the next token between using the original input and using a modified version where a part of the input is replaced with RoBERTa predictions. Our intuition is that replacing an important token in the context should have resulted in a larger change in the model's confidence in predicting the token than replacing an unimportant token. Our method can be universally applied to any generative LM without accessing internal model weights or additional training and fine-tuning, as most other FAs require. We extensively compare the faithfulness of ReAGent with seven popular FAs across six decoder-only LMs of various sizes. The results show that our method consistently provides more faithful token importance distributions.

翻译：特征归因方法（FAs）通过梯度或注意力机制等途径，用于推导所有输入特征对模型预测的重要性。现有自然语言处理研究主要聚焦于为分类任务中的编码器-仅语言模型（LMs）开发和测试FAs。然而，由于模型架构与任务设置的固有差异，这些FAs是否能在解码器-仅模型的文本生成任务中保持忠实性仍属未知。此外，先前研究表明，不存在一种能跨模型和任务的"万能"FA。这使得为大型LMs选择FA的计算成本极为高昂，因为输入重要性推导通常需要多次前向和反向传播，涉及梯度计算，即便拥有大规模算力也可能难以承受。为解决这些问题，我们提出一种适用于生成式LMs的模型无关FA——递归归因生成器（ReAGent）。该方法以递归方式更新令牌重要性分布。每次更新时，我们计算原始输入与经RoBERTa预测部分替换后的修改版输入之间，在预测下一令牌的词汇概率分布差异。我们的直觉是：相较于替换次要令牌，替换上下文中的重要令牌会导致模型在预测令牌时置信度发生更大变化。该方法可普遍应用于任意生成式LM，无需像多数其他FAs那样访问内部模型权重或进行额外训练与微调。我们通过六个不同规模的解码器-仅LMs，将ReAGent的忠实性与七种主流FAs进行全面对比。结果表明，本方法能持续提供更忠实的令牌重要性分布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日