While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we break down the self-correction process into two core components: mistake finding and output correction. For mistake finding, we release BIG-Bench Mistake, a dataset of logical mistakes in Chain-of-Thought reasoning traces. We provide benchmark numbers for several state-of-the-art LLMs, and demonstrate that LLMs generally struggle with finding logical mistakes. For output correction, we propose a backtracking method which provides large improvements when given information on mistake location. We construe backtracking as a lightweight alternative to reinforcement learning methods, and show that it remains effective with a reward model at 60-70% accuracy.
翻译:尽管自我纠正在改进LLM输出的风格和质量方面已显示出潜力(例如Chen等人,2023;Madaan等人,2023),但近期尝试自我纠正逻辑或推理错误的做法往往会导致正确答案变为错误,从而整体性能下降(Huang等人,2023)。本文中,我们将自我纠正过程分解为两个核心组成部分:错误发现与输出纠正。针对错误发现,我们发布了BIG-Bench Mistake数据集,该数据集包含思维链推理轨迹中的逻辑错误。我们为多个最先进的LLM提供了基准数值,并证明LLM普遍难以发现逻辑错误。针对输出纠正,我们提出了一种回溯方法,当提供错误位置信息时,该方法能带来大幅改进。我们将回溯视为强化学习方法的轻量级替代方案,并表明即使奖励模型精度仅为60-70%,该方法仍能保持有效性。