Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grained feedback (e.g., whether the answer is correct) to improve factual consistency. In this work, we propose RCoT (Reversing Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by automatically detecting and rectifying factual inconsistency in LLMs, generated solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct the problem based on generated solutions. Then fine-grained comparisons between the original problem and the reconstructed problem expose the factual inconsistency in the original solutions. To rectify the solution, RCoT formulates detected factual inconsistency into fine-grained feedback to guide LLMs in revising solutions. Experimental results demonstrate improvements of RCoT over standard CoT, Self-Consistency and Self-Refine across seven arithmetic datasets. Moreover, we find that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT reaches 94.6% accuracy on GSM8K), encouraging the community to further explore the fine-grained feedback generation methods.
翻译:摘要:大型语言模型(LLMs)通过引入逐步链式思维(CoT)提示,在算术推理任务中取得了显著性能。然而,LLMs在推理过程中难以保持事实一致性,常出现条件忽略、问题误解及对给定问题的条件幻觉等问题。现有方法采用粗粒度反馈(如答案是否正确)来提升事实一致性。本文提出RCoT(反向链式思维)方法,通过自动检测并修正LLMs生成解决方案中的事实不一致性,从而提升其推理能力。为检测事实不一致性,RCoT首先要求LLMs基于生成的解决方案重构原始问题,随后通过对比原始问题与重构问题中的细粒度差异,暴露原方案中的事实不一致性。在修正阶段,RCoT将检测到的事实不一致性转化为细粒度反馈,指导LLMs修订方案。实验表明,RCoT在七个算术数据集上均优于标准CoT、自一致性(Self-Consistency)及自优化(Self-Refine)方法。此外,我们发现人工撰写的细粒度反馈能显著提升LLMs的推理能力(例如ChatGPT在GSM8K上达到94.6%准确率),这鼓励学术界进一步探索细粒度反馈生成方法。