We study a synthetic corpus based approach for language models (LMs) to acquire logical deductive reasoning ability. The previous studies generated deduction examples using specific sets of deduction rules. However, these rules were limited or otherwise arbitrary, limiting the generalizability of acquired reasoning ability. We rethink this and adopt a well-grounded set of deduction rules based on formal logic theory, which can derive any other deduction rules when combined in a multistep way. Then, using the proposed corpora, which we name FLD (Formal Logic Deduction), we first evaluate and analyze the logical reasoning ability of the latest LLMs. Even GPT-4 can solve only half of the problems, suggesting that pure logical reasoning isolated from knowledge is still challenging for the LLMs, and additional training specialized in logical reasoning is indeed essential. We next empirically verify that LMs trained on FLD corpora acquire more generalizable reasoning ability. Furthermore, we identify the aspects of reasoning ability on which deduction corpora can enhance LMs and those on which they cannot, and discuss future directions on each aspect. The released corpora serve both as learning resources and as challenging benchmarks.
翻译:我们研究一种基于合成语料库的方法,使语言模型(LMs)具备逻辑演绎推理能力。以往的研究使用特定推理规则集生成演绎实例,但这些规则受限或存在任意性,限制了习得推理能力的泛化性。我们对此重新思考,采用基于形式逻辑理论的有充分依据的推理规则集,这些规则在组合使用时能够推导出任何其他推理规则。然后,利用所提出的语料库(我们将其命名为FLD,即形式逻辑演绎),我们首先评估并分析了最新大语言模型(LLMs)的逻辑推理能力。即使是GPT-4也只能解决其中一半的问题,这表明与知识相隔离的纯逻辑推理对LLMs仍具挑战性,针对逻辑推理的专项训练确有必要。接下来,我们通过实验验证了在FLD语料库上训练的LMs能够获得更具泛化性的推理能力。此外,我们还明确了演绎语料库能够增强LMs推理能力的方面以及不能增强的方面,并针对每个方面讨论了未来研究方向。所发布的语料库既可作为学习资源,也可作为具有挑战性的基准测试。