LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers?

Large language models (LLMs) are increasingly being used for tasks beyond text generation, including complex tasks such as data labeling, information extraction, etc. With the recent surge in research efforts to comprehend the full extent of LLM capabilities, in this work, we investigate the role of LLMs as counterfactual explanation modules, to explain decisions of black-box text classifiers. Inspired by causal thinking, we propose a pipeline for using LLMs to generate post-hoc, model-agnostic counterfactual explanations in a principled way via (i) leveraging the textual understanding capabilities of the LLM to identify and extract latent features, and (ii) leveraging the perturbation and generation capabilities of the same LLM to generate a counterfactual explanation by perturbing input features derived from the extracted latent features. We evaluate three variants of our framework, with varying degrees of specificity, on a suite of state-of-the-art LLMs, including ChatGPT and LLaMA 2. We evaluate the effectiveness and quality of the generated counterfactual explanations, over a variety of text classification benchmarks. Our results show varied performance of these models in different settings, with a full two-step feature extraction based variant outperforming others in most cases. Our pipeline can be used in automated explanation systems, potentially reducing human effort.

翻译：大型语言模型（LLMs）正越来越多地被用于文本生成之外的任务，包括数据标注、信息提取等复杂任务。随着近期研究努力全面理解LLMs能力的激增，在本文中，我们探究了LLMs作为反事实解释模块的角色，以解释黑盒文本分类器的决策。受因果推理的启发，我们提出了一种流程，通过以下方式原则性地使用LLMs生成事后、模型无关的反事实解释：（i）利用LLM的文本理解能力识别并提取潜在特征；以及（ii）利用同一LLM的扰动与生成能力，通过扰动从提取的潜在特征中派生出的输入特征来生成反事实解释。我们在包括ChatGPT和LLaMA 2在内的一系列最先进LLMs上评估了我们框架的三种变体，这些变体具有不同程度的特异性。我们在多种文本分类基准上评估了所生成反事实解释的有效性和质量。结果显示，这些模型在不同设置下表现各异，其中基于完整两步特征提取的变体在大多数情况下优于其他变体。我们的流程可用于自动化解释系统，有望减少人工工作量。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日