Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models

Integrated Gradients is a well-known technique for explaining deep learning models. It calculates feature importance scores by employing a gradient based approach computing gradients of the model output with respect to input features and accumulating them along a linear path. While this works well for continuous features spaces, it may not be the most optimal way to deal with discrete spaces like word embeddings. For interpreting LLMs (Large Language Models), there exists a need for a non-linear path where intermediate points, whose gradients are to be computed, lie close to actual words in the embedding space. In this paper, we propose a method called Uniform Discretized Integrated Gradients (UDIG) based on a new interpolation strategy where we choose a favorable nonlinear path for computing attribution scores suitable for predictive language models. We evaluate our method on two types of NLP tasks- Sentiment Classification and Question Answering against three metrics viz Log odds, Comprehensiveness and Sufficiency. For sentiment classification, we have used the SST2, IMDb and Rotten Tomatoes datasets for benchmarking and for Question Answering, we have used the fine-tuned BERT model on SQuAD dataset. Our approach outperforms the existing methods in almost all the metrics.

翻译：积分梯度是一种用于解释深度学习模型的知名技术。它通过采用基于梯度的方法计算模型输出相对于输入特征的梯度，并沿线性路径累积这些梯度来计算特征重要性分数。虽然这种方法在连续特征空间中表现良好，但对于词嵌入等离散空间可能并非最优方式。在解释大语言模型时，需要一种非线性路径，使得待计算梯度的中间点在嵌入空间中更接近实际词汇。本文提出一种基于新插值策略的方法，称为均匀离散化积分梯度，该方法通过选择适合预测性语言模型的非线性路径来计算归因分数。我们在两类自然语言处理任务（情感分类和问答）上评估了该方法，采用对数几率、完备性和充分性三个指标。对于情感分类，我们使用SST2、IMDb和烂番茄数据集进行基准测试；对于问答任务，我们使用基于SQuAD数据集微调的BERT模型。实验表明，我们的方法在几乎所有指标上都优于现有方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日