Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., "\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}}" Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.
翻译:思维链提示在各类推理任务中展现出良好性能。近期,自一致性方法提出采样一组多样化的推理链,这些链可能导向不同答案,同时选择得票最多的答案。本文提出一种新颖方法,在验证候选答案时使用反向推理。我们通过${\bf x}$遮蔽问题中的某个词元,并在提供候选答案时使用\textit{简单模板}要求大语言模型预测被遮蔽的词元,即"\textit{\textbf{如果我们知道上述问题的答案是\{候选答案\},未知变量${\bf x}$的值是多少?}}"直观上,若提供的候选答案正确,大语言模型应能成功预测被遮蔽的词元。我们进一步提出FOBAR方法,结合前向与反向推理来估计候选答案的概率。我们在六个数据集和三个大语言模型上开展广泛实验。实验结果表明,FOBAR在各种推理基准测试中均取得了最先进性能。