Evaluation Framework for Highlight Explanations of Context Utilisation in Language Models

Context utilisation, the ability of Language Models (LMs) to incorporate relevant information from the provided context when generating responses, remains largely opaque to users, who cannot determine whether models draw from parametric memory or provided context, nor identify which specific context pieces inform the response. Highlight explanations (HEs) offer a natural solution as they can point the exact context pieces and tokens that influenced model outputs. However, no existing work evaluates their effectiveness in accurately explaining context utilisation. We address this gap by introducing the first gold standard HE evaluation framework for context attribution, using controlled test cases with known ground-truth context usage, which avoids the limitations of existing indirect proxy evaluations. To demonstrate the framework's broad applicability, we evaluate four HE methods -- three established techniques and MechLight, a mechanistic interpretability approach we adapt for this task -- across four context scenarios, four datasets, and five LMs. Overall, we find that MechLight performs best across all context scenarios. However, all methods struggle with longer contexts and exhibit positional biases, pointing to fundamental challenges in explanation accuracy that require new approaches to deliver reliable context utilisation explanations at scale.

翻译：上下文利用，即语言模型（LM）在生成响应时整合所提供上下文中相关信息的能力，对用户而言在很大程度上仍是不透明的。用户无法确定模型是依赖参数记忆还是所提供的上下文，也无法识别具体是哪些上下文片段影响了响应。高亮解释（HEs）提供了一种自然的解决方案，因为它们能够精确指出影响模型输出的具体上下文片段和标记。然而，尚无现有工作评估其在准确解释上下文利用方面的有效性。我们通过引入首个用于上下文归因的黄金标准HE评估框架来填补这一空白。该框架使用具有已知真实上下文使用情况的受控测试案例，避免了现有间接代理评估的局限性。为了展示该框架的广泛适用性，我们评估了四种HE方法——三种现有成熟技术以及MechLight（一种我们为此任务调整的机制可解释性方法）——在四种上下文场景、四个数据集和五个LM上的表现。总体而言，我们发现MechLight在所有上下文场景中表现最佳。然而，所有方法在处理较长上下文时都存在困难，并表现出位置偏差，这指向了解释准确性方面的根本性挑战，需要新的方法来大规模提供可靠的上下文利用解释。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日