Prior work has combined chain-of-thought prompting in large language models (LLMs) with programmatic representations to perform effective and transparent reasoning. While such an approach works well for tasks that only require forward reasoning (e.g., straightforward arithmetic), it is less effective for constraint solving problems that require more sophisticated planning and search. In this paper, we propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of LLMs. We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. This approach has two key advantages. The declarative specification is closer to the problem description than the reasoning steps are, so the LLM can parse it out of the description more accurately. Furthermore, by offloading the actual reasoning task to an automated theorem prover, our approach can guarantee the correctness of the answer with respect to the parsed specification and avoid planning errors in the solving process. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm. In particular, SATLM outperforms program-aided LMs by 23% on a challenging subset of the GSM arithmetic reasoning dataset; SATLM also achieves a new SoTA on LSAT and BoardgameQA, surpassing previous models that are trained on the respective training sets.
翻译:摘要:先前的工作已将大型语言模型(LLM)中的思维链提示与程序化表示相结合,以实现有效且透明的推理。尽管这种方法适用于仅需前向推理的任务(例如简单算术),但对于需要更复杂规划与搜索的约束求解问题,其效果欠佳。本文提出一种新的满足性辅助语言建模(SatLM)方法,以提升LLM的推理能力。我们利用LLM生成声明式任务规范而非命令式程序,并借助现成的自动定理证明器得出最终答案。该方法具有两大关键优势:声明式规范比推理步骤更贴近问题描述,因此LLM能更准确地从描述中解析出该规范;此外,通过将实际推理任务转移至自动定理证明器,我们的方法能够保证答案相对于解析规范的正确性,并避免求解过程中的规划错误。我们在8个不同数据集上评估了SatLM,结果表明其在命令式范式下始终优于程序辅助语言模型。具体而言,在GSM算术推理数据集的一个具有挑战性的子集上,SatLM比程序辅助语言模型提升了23%;此外,SatLM还在LSAT和BoardgameQA上取得了新的最先进成果,超越了先前在各自训练集上训练的模型。