Large language models (LLMs) have made significant strides in reasoning capabilities, with ongoing efforts to refine their reasoning through self-correction. However, recent studies suggest that self-correction can be limited or even counterproductive without external accurate knowledge, raising questions about the limits and effectiveness of self-correction. In this paper, we aim to enhance LLM's self-checking capabilities by meticulously designing training data, thereby improving the accuracy of self-correction. We conduct a detailed analysis of error types in mathematical reasoning and develop a tailored prompt, termed "Step CoT Check". Then we construct a checking-correction dataset for training models. After integrating the original CoT data and checking-correction data for training, we observe that models could improve their self-checking capabilities, thereby enhancing their self-correction capacity and eliminating the need for external feedback or ground truth labels to ascertain the endpoint of correction. We compare the performance of models fine-tuned with the "Step CoT Check" prompt against those refined using other promps within the context of checking-correction data. The "Step CoT Check" outperforms the other two check formats in model with lager parameters, providing more precise feedback thus achieving a higher rate of correctness. For reproducibility, all the datasets and codes are provided in https://github.com/bammt/Learn-to-check.
翻译:大语言模型在推理能力方面取得了显著进展,研究者们不断探索通过自我纠正来优化其推理过程。然而,近期研究表明,若缺乏外部准确知识,自我纠正可能效果有限甚至适得其反,这引发了对自我纠正局限性和有效性的质疑。本文旨在通过精心设计训练数据来增强大语言模型的自我检查能力,从而提升自我纠正的准确性。我们详细分析了数学推理中的错误类型,并开发了一种名为“步骤链式检查”的定制化提示。随后,我们构建了用于模型训练的检查-纠正数据集。在将原始思维链数据与检查-纠正数据联合训练后,我们观察到模型能够改善自我检查能力,进而增强自我纠正能力,无需外部反馈或真实标签来确定纠正终点。我们比较了在检查-纠正数据背景下使用“步骤链式检查”提示微调的模型与其他提示优化模型的性能。对于参数规模较大的模型,“步骤链式检查”优于其他两种检查格式,能提供更精确的反馈,从而实现更高的正确率。为确保可复现性,所有数据集和代码均已开源至 https://github.com/bammt/Learn-to-check。