Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., ``\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}}'' Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.
翻译:思维链提示在各种推理任务中已展现出良好的性能。近期,自一致性方法提出对可能导致不同答案的多样化推理链进行采样,并选择获得最多投票的答案。本文提出了一种新颖方法,通过后向推理来验证候选答案。我们在问题中用${\bf x}$掩码一个标记,并采用简单模板(即“如果我们知道上述问题的答案是{候选答案},未知变量${\bf x}$的值是什么?”)要求大语言模型在给定候选答案时预测被掩码的标记。直观上,若提供的候选答案正确,大语言模型应能成功预测被掩码的标记。我们进一步提出前向-后向推理(FOBAR)方法,结合前向与后向推理来估计候选答案的概率。我们在六个数据集和三个大语言模型上进行了广泛实验。实验结果表明,FOBAR在多种推理基准测试中取得了最先进的性能。