This paper explores the role of the Chain of Thought (CoT) in Large Language Models (LLMs) reasoning. Despite its potential to improve task performance, our analysis reveals a surprising frequency of correct answers following incorrect CoTs and vice versa. We employ causal analysis to assess the cause-effect relationship between CoTs/instructions and answers in LLMs, uncovering the Structural Causal Model (SCM) that LLMs approximate. By comparing the implied SCM with that of human reasoning, we highlight discrepancies between LLM and human reasoning processes. We further examine the factors influencing the causal structure of the implied SCM, revealing that in-context learning, supervised fine-tuning, and reinforcement learning on human feedback significantly impact the causal relations. We release the code and results at https://github.com/StevenZHB/CoT_Causal_Analysis.
翻译:本文探讨了思维链(CoT)在大语言模型(LLMs)推理中的作用。尽管思维链有潜力提升任务性能,但我们的分析揭示了一个令人惊讶的现象:在错误的思维链之后出现正确答案,反之亦然的情况频繁发生。我们采用因果分析来评估LLMs中思维链/指令与答案之间的因果关系,揭示了LLMs近似采用的隐含结构因果模型(SCM)。通过将隐含SCM与人类推理的SCM进行比较,我们突出了LLM与人类推理过程之间的差异。我们进一步研究了影响隐含SCM因果结构的因素,发现上下文学习、监督微调以及基于人类反馈的强化学习显著影响了这些因果关系的强度。我们在https://github.com/StevenZHB/CoT_Causal_Analysis 公开了代码和结果。