Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., ``\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}}'' Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.
翻译:链式思维(Chain-of-Thought, CoT)提示已在多种推理任务中展现出良好性能。近期,Self-Consistency(王等人,2023)提出对多样的推理链进行采样,这些链可能导向不同答案,最终选择得票最多的答案。本文提出一种利用反向推理验证候选答案的新方法。我们使用{\bf x}掩码问题中的某个标记,并借助\textit{简单模板},即在提供候选答案时让大语言模型预测被掩码的标记,该模板为:“\textit{\textbf{若已知上述问题的答案为\{候选答案\},未知变量${\bf x}$的值是多少?}}”。直观而言,若提供的候选答案正确,则大语言模型有望成功预测出被掩码的标记。我们进一步提出FOBAR方法,以结合正向推理与反向推理来估计候选答案的概率。我们在六个数据集和三种大语言模型上进行了广泛实验。实验结果表明,FOBAR在各类推理基准测试中均达到了最先进的性能。