Large language models (LLMs) have made significant strides in reasoning capabilities, with ongoing efforts to refine their reasoning through self-correction. However, recent studies suggest that self-correction can be limited or even counterproductive without external accurate knowledge, raising questions about the limits and effectiveness of self-correction. In this paper, we aim to enhance LLM's self-checking capabilities by meticulously designing training data, thereby improving the accuracy of self-correction. We conduct a detailed analysis of error types in mathematical reasoning and develop a tailored prompt, termed ``Step CoT Check''. Then we construct a checking-correction dataset for training models. After integrating the original CoT data and checking-correction data for training, we observe that models could improve their self-checking capabilities, thereby enhancing their self-correction capacity and eliminating the need for external feedback or ground truth labels to ascertain the endpoint of correction. We compare the performance of models fine-tuned with the ``Step CoT Check'' prompt against those refined using other promps within the context of checking-correction data. The ``Step CoT Check'' outperforms the other two check formats in model with lager parameters, providing more precise feedback thus achieving a higher rate of correctness. For reproducibility, all the datasets and codes are provided in \url{https://github.com/bammt/Learn-to-check}.
翻译:大语言模型在推理能力方面取得了显著进展,人们持续努力通过自我修正来优化其推理过程。然而,近期研究表明,若无外部准确知识支持,自我修正可能效果有限甚至适得其反,这引发了对自我修正局限性与有效性的质疑。本文旨在通过精心设计训练数据增强大语言模型的自我检查能力,进而提升自我修正的准确性。我们细致分析了数学推理中的错误类型,并开发了名为"Step CoT Check"的定制化提示词。随后构建了用于模型训练的检查-修正数据集。将原始CoT数据与检查-修正数据联合训练后,模型能够提升自我检查能力,从而增强自我修正能力,无需外部反馈或真实标签来确定修正终止点。我们比较了使用"Step CoT Check"提示词微调的模型与在检查-修正数据背景下使用其他提示词优化的模型性能。在参数规模较大的模型中,"Step CoT Check"优于其他两种检查格式,能提供更精确的反馈,从而实现更高的正确率。为保障可重复性,所有数据集与代码均提供于\url{https://github.com/bammt/Learn-to-check}。