ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models

Feature attribution methods (FAs), such as gradients and attention, are widely employed approaches to derive the importance of all input features to the model predictions. Existing work in natural language processing has mostly focused on developing and testing FAs for encoder-only language models (LMs) in classification tasks. However, it is unknown if it is faithful to use these FAs for decoder-only models on text generation, due to the inherent differences between model architectures and task settings respectively. Moreover, previous work has demonstrated that there is no `one-wins-all' FA across models and tasks. This makes the selection of a FA computationally expensive for large LMs since input importance derivation often requires multiple forward and backward passes including gradient computations that might be prohibitive even with access to large compute. To address these issues, we present a model-agnostic FA for generative LMs called Recursive Attribution Generator (ReAGent). Our method updates the token importance distribution in a recursive manner. For each update, we compute the difference in the probability distribution over the vocabulary for predicting the next token between using the original input and using a modified version where a part of the input is replaced with RoBERTa predictions. Our intuition is that replacing an important token in the context should have resulted in a larger change in the model's confidence in predicting the token than replacing an unimportant token. Our method can be universally applied to any generative LM without accessing internal model weights or additional training and fine-tuning, as most other FAs require. We extensively compare the faithfulness of ReAGent with seven popular FAs across six decoder-only LMs of various sizes. The results show that our method consistently provides more faithful token importance distributions.

翻译：特征归因方法（FAs）通过梯度、注意力机制等手段，广泛用于衡量各输入特征对模型预测的重要性。现有自然语言处理研究主要聚焦于在分类任务中对仅编码器语言模型（LMs）开发并测试FAs，但由于模型架构与任务设置的固有差异，尚不清楚这些FAs能否忠实适用于仅解码器模型的文本生成场景。此外，已有研究表明，不存在跨模型与任务的“通用最优”FA，这导致大型语言模型选择FA时计算成本高昂——因输入重要性推导常需多次前向/反向传播（含梯度计算），即便拥有大规模算力也可能难以承受。为解决上述问题，我们提出一种面向生成式语言模型的模型无关特征归因方法——递归归因生成器（ReAGent）。该方法通过递归方式更新词元重要性分布：每次更新时，分别计算原始输入与经RoBERTa预测替换部分输入后的修改版本在预测下一词元时词汇表上的概率分布差异。我们的直觉是：替换上下文中的关键词元比替换非关键词元会导致模型预测置信度产生更大变化。ReAGent可通用适配任意生成式语言模型，无需如同多数现有FA般访问模型内部权重或额外训练微调。我们通过六个不同规模的仅解码器语言模型，将ReAGent的忠实度与七种主流FA进行系统对比。结果表明，本方法能持续输出更忠实的词元重要性分布。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日