Chain-of-Thought Unfaithfulness as Disguised Accuracy

Understanding the extent to which Chain-of-Thought (CoT) generations align with a large language model's (LLM) internal computations is critical for deciding whether to trust an LLM's output. As a proxy for CoT faithfulness, Lanham et al. (2023) propose a metric that measures a model's dependence on its CoT for producing an answer. Within a single family of proprietary models, they find that LLMs exhibit a scaling-then-inverse-scaling relationship between model size and their measure of faithfulness, and that a 13 billion parameter model exhibits increased faithfulness compared to models ranging from 810 million to 175 billion parameters in size. We evaluate whether these results generalize as a property of all LLMs. We replicate the experimental setup in their section focused on scaling experiments with three different families of models and, under specific conditions, successfully reproduce the scaling trends for CoT faithfulness they report. However, after normalizing the metric to account for a model's bias toward certain answer choices, unfaithfulness drops significantly for smaller less-capable models. This normalized faithfulness metric is also strongly correlated ($R^2$=0.74) with accuracy, raising doubts about its validity for evaluating faithfulness.

翻译：理解思维链（CoT）生成与大型语言模型（LLM）内部计算的对齐程度，对于决定是否信任LLM的输出至关重要。作为CoT忠实性的代理指标，Lanham等人（2023）提出了一种度量方法，用于衡量模型生成答案时对其自身CoT的依赖程度。在单一系列的专有模型家族中，他们发现LLM在模型规模与其忠实性度量之间呈现出先增后减的缩放关系，并且一个130亿参数的模型相较于8.1亿至1750亿参数规模的模型表现出更高的忠实性。我们评估这些结果是否能够推广为所有LLM的普遍属性。我们复制了其专注于缩放实验部分的实验设置，并在三个不同的模型家族中进行验证；在特定条件下，我们成功复现了他们所报告的CoT忠实性缩放趋势。然而，在通过归一化该度量以消除模型对特定答案选项的偏好后，较小且能力较弱的模型的不忠实性显著下降。这一归一化后的忠实性度量也与模型准确率高度相关（$R^2$=0.74），从而对其作为评估忠实性指标的有效性提出了质疑。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日