Reliable Reasoning Beyond Natural Language

Despite their linguistic competence, Large Language models (LLMs) often exhibit limitations in their ability to reason reliably and flexibly. To address this, we propose a neurosymbolic approach that prompts LLMs to extract and encode all relevant information from a problem statement as logical code statements, and then use a logic programming language (Prolog) to conduct the iterative computations of explicit deductive reasoning. Our approach significantly enhances the performance of LLMs on the standard mathematical reasoning benchmark, GSM8k, and the Navigate dataset from the BIG-bench dataset. Additionally, we introduce a novel dataset, the Non-Linear Reasoning (NLR) dataset, consisting of 55 unique word problems that target the shortcomings of the next token prediction paradigm of LLMs and require complex non-linear reasoning but only basic arithmetic skills to solve. Our findings demonstrate that the integration of Prolog enables LLMs to achieve high performance on the NLR dataset, which even the most advanced language models (including GPT4) fail to solve using text only.

翻译：尽管大型语言模型（LLM）具备语言能力，但其在可靠且灵活地进行推理方面往往表现出局限性。为解决这一问题，我们提出了一种神经符号方法，该方法引导LLM从问题陈述中提取所有相关信息并将其编码为逻辑代码语句，然后使用逻辑编程语言（Prolog）执行显式演绎推理的迭代计算。我们的方法显著提升了LLM在标准数学推理基准GSM8k以及BIG-bench数据集中Navigate数据集上的性能。此外，我们引入了一个新颖的数据集——非线性推理（NLR）数据集，该数据集包含55个独特的文字问题，旨在针对LLM的下一词预测范式的缺陷，这些问题需要复杂的非线性推理但仅需基本算术技能即可解决。我们的研究结果表明，Prolog的集成使LLM能够在NLR数据集上实现高性能，而即使是最先进的语言模型（包括GPT4）仅使用文本也无法解决该数据集中的问题。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日