On Measuring Faithfulness of Natural Language Explanations

Large language models (LLMs) can explain their own predictions, through post-hoc or Chain-of-Thought (CoT) explanations. However the LLM could make up reasonably sounding explanations that are unfaithful to its underlying reasoning. Recent work has designed tests that aim to judge the faithfulness of either post-hoc or CoT explanations. In this paper we argue that existing faithfulness tests are not actually measuring faithfulness in terms of the models' inner workings, but only evaluate their self-consistency on the output level. The aims of our work are two-fold. i) We aim to clarify the status of existing faithfulness tests in terms of model explainability, characterising them as self-consistency tests instead. This assessment we underline by constructing a Comparative Consistency Bank for self-consistency tests that for the first time compares existing tests on a common suite of 11 open-source LLMs and 5 datasets -- including ii) our own proposed self-consistency measure CC-SHAP. CC-SHAP is a new fine-grained measure (not test) of LLM self-consistency that compares a model's input contributions to answer prediction and generated explanation. With CC-SHAP, we aim to take a step further towards measuring faithfulness with a more interpretable and fine-grained method. Code available at \url{https://github.com/Heidelberg-NLP/CC-SHAP}

翻译：大型语言模型（LLMs）能够通过事后解释或思维链（CoT）解释来解释自身预测结果。然而，LLM可能编造出听起来合理但与其潜在推理不忠实的解释。近期研究设计了一系列测试，旨在评判事后解释或CoT解释的忠实性。本文认为，现有忠实性测试实际上并未从模型内部工作机制角度度量忠实性，仅评估了其在输出层面的自一致性。本文目标分为两方面：i）旨在厘清现有忠实性测试在模型可解释性方面的定位，将其界定为自一致性测试。我们通过构建自一致性测试的比较一致性基准（Comparative Consistency Bank）来强调这一评估——该基准首次在包含11个开源LLM和5个数据集的通用平台上比较现有测试——包括ii）我们自身提出的自一致性度量指标CC-SHAP。CC-SHAP是一种新的细粒度LLM自一致性度量方法（非测试），它比较模型在答案预测与生成解释中的输入贡献。借助CC-SHAP，我们旨在通过更具可解释性和细粒度的方法，向忠实性度量迈进一步。代码见 \url{https://github.com/Heidelberg-NLP/CC-SHAP}

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日