To harness the power of large language models in safety-critical domains we need to ensure the explainability of their predictions. However, despite the significant attention to model interpretability, there remains an unexplored domain in explaining sequence-to-sequence tasks using methods tailored for textual data. This paper introduces SyntaxShap, a local, model-agnostic explainability method for text generation that takes into consideration the syntax in the text data. The presented work extends Shapley values to account for parsing-based syntactic dependencies. Taking a game theoric approach, SyntaxShap only considers coalitions constraint by the dependency tree. We adopt a model-based evaluation to compare SyntaxShap and its weighted form to state-of-the-art explainability methods adapted to text generation tasks, using diverse metrics including faithfulness, complexity, coherency, and semantic alignment of the explanations to the model. We show that our syntax-aware method produces explanations that help build more faithful, coherent, and interpretable explanations for predictions by autoregressive models.
翻译:为在安全关键领域利用大语言模型的能力,需确保其预测的可解释性。尽管模型可解释性已受到广泛关注,但针对序列到序列任务中使用文本数据定制化解释方法的研究仍属空白。本文提出SyntaxShap——一种面向文本生成的局部、模型无关可解释性方法,该方法充分考虑了文本数据中的句法结构。本研究将沙普利值扩展至基于句法解析的依存关系,采用博弈论方法仅考虑受依存树约束的联盟。我们通过基于模型的评估,将SyntaxShap及其加权形式与适配文本生成任务的最新可解释性方法进行比较,采用包括忠实验证、复杂度、连贯性及解释与模型的语义对齐性在内的多元指标。研究证明,本文提出的语法感知方法能为自回归模型预测生成更忠实、连贯且易于理解的解释。