大型语言模型在生成与评估反事实解释方面的综合研究 (LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study)

As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions.

翻译：随着NLP模型日益复杂，理解其决策机制变得愈发关键。反事实解释通过对输入进行最小改动以翻转模型预测，为解释此类模型提供了一种有效途径。尽管大型语言模型在NLP任务中展现出卓越性能，但其生成高质量反事实解释的能力仍不明确。本研究通过探究LLMs在两项自然语言理解任务中生成反事实解释的效果填补了这一空白。我们对多种常见LLMs进行了全面比较，从内在指标和数据增强效果两个维度评估其生成的反事实解释。此外，我们分析了人工生成与LLM生成反事实解释的差异，为未来研究方向提供洞见。实验结果表明：LLMs能生成流畅的反事实解释，但难以保持最小改动程度；在情感分析任务中生成反事实解释的难度低于自然语言推理任务——后者暴露出LLMs在翻转原始标签方面的缺陷。这一现象也体现在数据增强性能上：使用人工生成与LLM生成反事实解释进行数据增强的效果存在显著差距。进一步地，我们在错误标注数据场景下评估LLMs评判反事实解释的能力，发现其存在强烈认同给定标签的偏向性。GPT4对此类偏向性具有更强鲁棒性，其评分与自动评估指标高度相关。本研究揭示了若干局限性，并指出了潜在的未来研究方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日