Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether smaller-size (<= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipeline that prompts smaller LMs to collect self-correction data that supports the training of self-refinement abilities. First, we leverage correct solutions to guide the model in critiquing their incorrect responses. Second, the generated critiques, after filtering, are used for supervised fine-tuning of the self-correcting reasoner through solution refinement. Our experimental results show improved self-correction abilities of two models on five datasets spanning math and commonsense reasoning, with notable performance gains when paired with a strong GPT-4-based verifier, though limitations are identified when using a weak self-verifier for determining when to correct.
翻译:自纠正已成为一种提升大语言模型推理性能的有效方法,该方法通过让模型利用自行生成的定位错误的批判来优化解决方案。本研究探讨了较小规模(≤13B参数)语言模型在推理任务中是否具备自纠正能力,且只需大语言模型提供最小程度的输入。我们提出了一种新型流程,通过引导小模型收集支持自优化能力训练的自纠正数据。首先,利用正确解决方案引导模型对其错误响应进行批判;其次,经筛选后将这些生成的批判用于通过解决方案优化对自纠正推理器进行监督微调。实验结果表明,两个模型在涵盖数学推理和常识推理的五个数据集上的自纠正能力均有所提升,尤其在与基于GPT-4的强验证器配合使用时性能提升显著,但采用弱自验证器判断纠正时机时仍存在局限性。