Despite their linguistic competence, Large Language models (LLMs) often exhibit limitations in their ability to reason reliably and flexibly. To address this, we propose a neurosymbolic approach that prompts LLMs to extract and encode all relevant information from a problem statement as logical code statements, and then use a logic programming language (Prolog) to conduct the iterative computations of explicit deductive reasoning. Our approach significantly enhances the performance of LLMs on the standard mathematical reasoning benchmark, GSM8k, and the Navigate dataset from the BIG-bench dataset. Additionally, we introduce a novel dataset, the Non-Linear Reasoning (NLR) dataset, consisting of 55 unique word problems that target the shortcomings of the next token prediction paradigm of LLMs and require complex non-linear reasoning but only basic arithmetic skills to solve. Our findings demonstrate that the integration of Prolog enables LLMs to achieve high performance on the NLR dataset, which even the most advanced language models (including GPT4) fail to solve using text only.
翻译:尽管大型语言模型(LLM)具备语言能力,但其在可靠且灵活地进行推理方面往往表现出局限性。为解决这一问题,我们提出了一种神经符号方法,该方法引导LLM从问题陈述中提取所有相关信息并将其编码为逻辑代码语句,然后使用逻辑编程语言(Prolog)执行显式演绎推理的迭代计算。我们的方法显著提升了LLM在标准数学推理基准GSM8k以及BIG-bench数据集中Navigate数据集上的性能。此外,我们引入了一个新颖的数据集——非线性推理(NLR)数据集,该数据集包含55个独特的文字问题,旨在针对LLM的下一词预测范式的缺陷,这些问题需要复杂的非线性推理但仅需基本算术技能即可解决。我们的研究结果表明,Prolog的集成使LLM能够在NLR数据集上实现高性能,而即使是最先进的语言模型(包括GPT4)仅使用文本也无法解决该数据集中的问题。