Prior work has combined chain-of-thought prompting in large language models (LLMs) with programmatic representations to perform effective and transparent reasoning. While such an approach works very well for tasks that only require forward reasoning (e.g., straightforward arithmetic), it is less effective for constraint solving tasks that require more sophisticated planning and search. In this paper, we propose a new satisfiability-aided language modeling approach for improving the reasoning capabilities of LLMs. We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. This approach has two key advantages. The declarative specification is closer to the problem description than the reasoning steps are, so the LLM can parse it more accurately. Furthermore, by offloading the actual reasoning task to an automated theorem prover, our approach can guarantee the correctness of the answer with respect to the parsed specification and avoid planning errors in the reasoning process. We evaluate SATLM on 6 different datasets and show that it consistently outperforms program-aided LMs in an imperative paradigm (PROGLM). In particular, SATLM outperforms PROGLM by 23% on a challenging subset of GSM; SATLM also achieves a new SoTA on LSAT, surpassing previous models that are trained on the full training set.
翻译:先前工作将大语言模型中的链式思考提示与程序化表示相结合,以实现有效且透明的推理。尽管这种方法在仅需前向推理(如简单算术)的任务中表现良好,但对于需要更复杂规划和搜索的约束求解任务效果欠佳。本文提出一种新的满足性辅助语言建模方法,以提升大语言模型的推理能力。我们利用大语言模型生成声明式任务规范而非命令式程序,并借助现成的自动定理证明器推导最终答案。该方法具有两大优势:声明式规范比推理步骤更贴近问题描述,因此大语言模型能更准确地解析它;此外,通过将实际推理任务外包给自动定理证明器,我们的方法能保证答案相对于解析规范的正确性,并避免推理过程中的规划错误。我们在6个不同数据集上对SATLM进行评估,结果表明其在命令式范式(PROGLM)中始终优于程序辅助语言模型。特别是在GSM的挑战性子集上,SATLM比PROGLM提升23%;在LSAT数据集上,SATLM还取得了新的最佳结果,超越了此前基于完整训练集训练的模型。