Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developments in the explainability of neural network models over the past decade. Among them, post-hoc explainability methods, especially Shapley values, have proven effective for interpreting deep learning models. However, there are major challenges in scaling up Shapley values for LLMs, particularly when dealing with long input contexts containing thousands of tokens and autoregressively generated output sequences. Furthermore, it is often unclear how to effectively utilize generated explanations to improve the performance of LLMs. In this paper, we introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques. We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations, reducing processing times from hours to minutes for token-level explanations, and to just seconds for document-level explanations. In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios, providing better understanding of long-document question answering by localizing important words and sentences; and improving existing document retrieval systems through enhancing the accuracy of selected passages and ultimately the final responses.
翻译:大型语言模型(LLMs)因其日益精准的响应和连贯推理能力,在实际应用中引发了广泛关注。鉴于其基于输入进行复杂推理的黑箱特性,对LLMs生成内容进行可扩展且可信解释的需求必将持续增长。过去十年中,神经网络模型的可解释性研究取得了重大进展。其中,事后解释方法(尤其是沙普利值)已被证明对深度学习模型解释有效。然而,将沙普利值扩展到LLMs仍面临重大挑战,特别是在处理包含数千个token的长输入上下文和自回归生成的输出序列时。此外,如何有效利用生成的解释来提升LLMs性能仍不明确。本文提出TextGenSHAP——一种融合语言模型特定技术的高效事后解释方法。实验表明,与传统沙普利值计算相比,该方法显著提升了处理速度:Token级解释从数小时缩短至分钟级,文档级解释仅需数秒。我们还展示了实时沙普利值在两类重要场景中的应用:通过定位关键词语和句子,深化对长文档问答的理解;通过提升选定段落精度和最终响应质量,改进现有文档检索系统。