Large language models (LLMs) improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. Our research investigates how LLMs recover from errors in Chain of Thought, reaching the correct final answer despite mistakes in the reasoning text. Through analysis of these error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, but we also identify many clear examples of faithful error recovery behaviors. We identify factors that shift LLM recovery behavior: LLMs recover more frequently from obvious errors and in contexts that provide more evidence for the correct answer. However, unfaithful recoveries show the opposite behavior, occurring more frequently for more difficult error positions. Our results indicate that there are distinct mechanisms driving faithful and unfaithful error recoveries. Our results challenge the view that LLM reasoning is a uniform, coherent process.
翻译:大语言模型(LLMs)在生成答案前产生思维链推理文本时,其在后续任务中的表现会得到提升。本研究探讨了LLMs如何从思维链中的错误中恢复,尽管推理文本中存在错误,仍能得出正确的最终答案。通过对这些错误恢复行为的分析,我们发现了思维链中存在不忠实性的证据,但也识别出许多清晰的忠实错误恢复行为实例。我们确定了影响LLM恢复行为的因素:LLMs更频繁地从明显错误中恢复,并在为正确答案提供更多证据的上下文中恢复。然而,非忠实恢复表现出相反的行为,在更困难的错误位置发生得更频繁。我们的结果表明,驱动忠实与非忠实错误恢复的机制是不同的。这些结果挑战了LLM推理是一个统一、连贯过程的观点。