Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension to generative language models. To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. To handle text output, we introduce the notion of scalarizers for mapping text to real numbers and investigate multiple possibilities. To handle long inputs, we take a multi-level approach, proceeding from coarser levels of granularity to finer ones, and focus on algorithms with linear scaling in model queries. We conduct a systematic evaluation, both automated and human, of perturbation-based attribution methods for summarization and context-grounded question answering. The results show that our framework can provide more locally faithful explanations of generated outputs.
翻译:扰动式解释方法(如LIME和SHAP)通常应用于文本分类任务。本研究聚焦于将其扩展至生成式语言模型。为应对文本输出和长文本输入带来的挑战,我们提出通用框架MExGen,该框架可通过不同归因算法进行实例化。针对文本输出问题,我们引入"标量器"概念以实现文本到实数的映射,并探究多种实现方案。为处理长序列输入,我们采用多层级策略,从粗粒度逐步过渡到细粒度,并重点研究模型查询次数呈线性扩展的算法。我们针对摘要生成和基于上下文的问答任务,开展了包含自动化评估与人工评估的系统性扰动归因方法评测。实验结果表明,本框架能为生成输出提供更具局部忠实性的解释。