The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.
翻译:近年来,大语言模型(LLMs)的进展,尤其是思维链(CoT)提示方法的提出,使解决推理问题成为可能。然而,即使是最强大的LLMs在处理需要非线性思维与多步推理的复杂问题时仍然面临挑战。本文探讨LLMs是否具备不依赖外部资源识别自身错误的能力,特别关注其能否在逐步推理过程中定位单步错误。为此,我们提出一种零样本验证方案来识别此类错误,并利用该方案对不同生成答案进行加权投票,从而提升问答任务性能。我们在三个数学数据集(GSM8K、MathQA、MATH)上测试该方法,结果表明其能有效识别错误,并最终提高预测性能。