Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems

While forward reasoning (i.e., find the answer given the question) has been explored extensively in recent literature, backward reasoning is relatively unexplored. We examine the backward reasoning capabilities of LLMs on Math Word Problems (MWPs): given a mathematical question and its answer, with some details omitted from the question, can LLMs effectively retrieve the missing information? On modifying three benchmark datasets for this task, to evaluate this task: GSM8k, SVAMP, and MultiArith, we find a significant drop in the accuracy of models on this task compared to forward reasoning across SOTA LLMs (GPT4, GPT3.5, PaLM-2, and LLaMa). Motivated by the fact backward reasoning can be seen as the ''inverse'' of forward reasoning, we propose variations of three different forward reasoning strategies to improve performance. Rephrase reformulates the given problem into a forward reasoning problem, PAL-Tools combines the idea of Program-Aided LLMs to produce a set of equations that can be solved by an external solver, and Check your Work exploits the availability of natural verifier of high accuracy in the forward direction, interleaving solving and verification steps. Finally, realizing that each of our base methods correctly solves a different set of problems, we propose a novel Bayesian formulation for creating an ensemble over the base methods to further boost the accuracy. Extensive experimentation demonstrates successive improvement in the performance of LLMs on the backward reasoning task, using our strategies, with our ensemble-based method resulting in significant performance gains compared to the SOTA forward reasoning strategies we adapt.

翻译：尽管前向推理（即给定问题寻找答案）在近期文献中已得到广泛探索，逆向推理的研究则相对不足。我们考察了大语言模型在数学应用题上的逆向推理能力：给定一个数学问题及其答案，但问题中部分细节被省略，大语言模型能否有效检索出缺失信息？通过为此任务修改三个基准数据集（GSM8k、SVAMP和MultiArith）进行评估，我们发现相较于前向推理，当前最先进的大语言模型（GPT4、GPT3.5、PaLM-2和LLaMa）在此任务上的准确率显著下降。鉴于逆向推理可视为前向推理的“逆过程”，我们提出了三种不同前向推理策略的变体以提升性能：Rephrase将给定问题重构成前向推理问题，PAL-Tools结合程序辅助大语言模型的思想生成可由外部求解器求解的方程组，而Check your Work则利用前向方向存在高精度自然验证器的特性，交错进行求解与验证步骤。最后，认识到每种基础方法能正确解决不同的问题集合，我们提出了一种新颖的贝叶斯框架，用于构建基于基础方法的集成策略，以进一步提升准确率。大量实验表明，采用我们的策略后，大语言模型在逆向推理任务上的性能得到持续改善，其中基于集成的方法相较于我们所适配的最先进前向推理策略，实现了显著的性能提升。