Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grained feedback (e.g., whether the answer is correct) to improve factual consistency. In this work, we propose RCoT (Reversing Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by automatically detecting and rectifying factual inconsistency in LLMs' generated solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct the problem based on generated solutions. Then fine-grained comparisons between the original problem and the reconstructed problem expose the factual inconsistency in the original solutions. To rectify the solution, RCoT formulates detected factual inconsistency into fine-grained feedback to guide LLMs in revising solutions. Experimental results demonstrate consistent improvements of RCoT over standard CoT across seven arithmetic datasets. Moreover, we find that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT reaches 94.6% accuracy on GSM8K), encouraging the community to further explore the fine-grained feedback generation methods.
翻译:大型语言模型通过引入逐步的思维链提示,在算术推理任务上取得了显著性能。然而,这类模型在推理过程中难以保持事实一致性,常出现条件忽略、问题误读及条件幻觉等问题。现有方法采用粗粒度反馈(如答案正确性)来提升事实一致性。本文提出逆向思维链方法(Reversing Chain-of-Thought,RCoT),这是一种通过自动检测并纠正LLM生成解决方案中事实不一致性来增强其推理能力的新方法。为检测事实不一致性,RCoT首先要求LLM基于已生成解决方案重构原始问题,随后通过对比重构问题与原始问题的细粒度差异,揭示原解决方案中的事实矛盾。为修正解决方案,RCoT将检测到的事实不一致性转化为细粒度反馈以引导LLM进行修订。实验结果表明,在七个算术数据集上,RCoT相较于标准CoT方法均实现了一致性改进。此外,我们发现人工撰写的细粒度反馈可显著提升LLM的推理能力(例如,ChatGPT在GSM8K数据集上达到94.6%的准确率),这鼓励学界进一步探索细粒度反馈生成方法。