Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. Consequently, there's a growing interest in using LLMs for logical reasoning via natural language. This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning. To offer a thorough analysis, we have compiled a benchmark titled LogiGLUE. This includes 24 varied datasets encompassing deductive, abductive, and inductive reasoning. Utilizing LogiGLUE as a foundation, we have trained an instruction fine-tuned language model, resulting in LogiT5. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model across the different logical reasoning categories. We also assess various LLMs using LogiGLUE, and the findings indicate that LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We aim to shed light on the capabilities and potential pathways for enhancing logical reasoning proficiency in LLMs, paving the way for more advanced and nuanced developments in this critical field.
翻译:摘要:逻辑推理是人类认知的基础能力,却是人工智能领域面临的重大挑战。早期研究者采用知识表示与推理(KR)系统,但此类系统扩展性差且需要大量人工干预。近年来,大语言模型(LLMs)的涌现展现出克服形式化知识表示系统诸多局限的能力,因此通过自然语言实现LLMs逻辑推理的研究兴趣日益增长。本文通过梳理该领域最新进展,聚焦逻辑推理数据集、任务及基于LLMs的推理方法,旨在理解LLMs的逻辑推理能力。为进行系统性分析,我们构建了名为LogiGLUE的基准测试集,包含涵盖演绎推理、溯因推理和归纳推理的24个多样化数据集。在此基础上,我们训练了经指令微调的语言模型LogiT5,并采用单任务训练、多任务训练及"思维链"知识蒸馏微调技术,评估模型在不同逻辑推理类别上的表现。我们同时使用LogiGLUE评估多种LLMs,结果表明LLMs在溯因推理方面表现最佳,其次为演绎推理,而归纳推理能力最弱。本研究旨在揭示LLMs逻辑推理能力的现状与发展路径,推动该关键领域的创新突破。