Chain-of-Thought (CoT) prompting in large language models (LLMs) has shown promising performance on mathematical reasoning tasks. Recently, Self-Consistency samples a diverse set of reasoning chains with different answers and chooses the answer by majority voting. Though effective, its performance cannot be further improved by sampling more reasoning chains. To address this problem, we propose to integrate backward reasoning into answer verification. We first mask a number in the question by ${\bf x}$. The LLM is then asked to predict the masked number with a candidate answer $A$ embedded in the template: ``If we know the answer to the above question is $\{A\}$, what is the value of unknown variable ${\bf x}$?'' The LLM is expected to predict the masked number successfully if the provided candidate answer is correct. To further improve performance, we propose FOBAR (FOrward-BAckward Reasoning) to combine forward and backward reasoning for verifying candidate answers. Experiments are performed on six standard mathematical data sets and three LLMs (text-davinci-003, GPT-3.5-Turbo, GPT-4). Results show that FOBAR achieves state-of-the-art performance. In particular, FOBAR outperforms Self-Consistency which uses forward reasoning alone, demonstrating that combining forward and forward reasoning is better. It also outperforms existing verification methods, verifying the effectiveness of using the simple template in backward reasoning and the proposed combination.
翻译:思维链(Chain-of-Thought, CoT)提示方法在大语言模型(LLMs)的数学推理任务中展现出优异性能。近期提出的自一致性(Self-Consistency)方法通过采样不同答案的多条推理链,并基于多数投票选择最终答案。虽然该方法有效,但其性能无法通过增加采样推理链数量进一步提升。针对此问题,我们提出将反向推理融入答案验证过程。首先,我们在问题中用符号${\bf x}$掩码一个数字,随后要求LLM根据嵌入模板中的候选答案$A$预测该掩码数字:模板为"如果我们知道上述问题的答案是$\{A\}$,未知变量${\bf x}$的值是多少?"。若提供的候选答案正确,则LLM能成功预测掩码数字。为进一步提升性能,我们提出FOBAR(FOrward-BAckward Reasoning)方法,通过结合前向与反向推理来验证候选答案。在六个标准数学数据集及三个LLM(text-davinci-003、GPT-3.5-Turbo、GPT-4)上的实验表明,FOBAR方法达到了最先进性能。特别地,FOBAR优于仅使用前向推理的自一致性方法,证明前向与反向推理的结合更为有效。同时,该方法也优于现有验证方法,验证了反向推理中简单模板及所提出的组合方法的有效性。