AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a >300x speedup while remaining more faithful to a target model's LOO error than prior context attribution methods. This stark increase in performance makes computing context attributions for a given response 30x faster than generating the response itself, empowering real-world applications that require computing attributions at scale. We release a user-friendly and efficient implementation of AttriBoT to enable efficient LLM interpretability as well as encourage future development of efficient context attribution methods.

翻译：大型语言模型（LLM）中上下文输入对其行为的影响，推动了上下文归因方法的发展，这些方法旨在量化每个上下文片段对LLM生成内容的影响。留一法（LOO）误差通过测量移除特定上下文片段时LLM响应似然的变化，为上下文归因提供了原则性方法，但对于大型模型而言，其计算成本可能过高。本研究提出AttriBoT，一系列高效计算上下文归因中LOO误差近似值的新技术。具体而言，AttriBoT利用缓存激活值避免冗余计算，采用分层归因降低运算量，并通过小型代理模型模拟大型目标模型的行为。综合这些技术，AttriBoT在保持比现有上下文归因方法更贴近目标模型LOO误差的同时，能实现超过300倍的加速。这种性能的显著提升使得计算给定响应的上下文归因比生成响应本身快30倍，从而赋能需要大规模计算归因的实际应用。我们发布了用户友好且高效的AttriBoT实现，以促进高效LLM可解释性研究，并鼓励未来高效上下文归因方法的进一步发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日