This work presents a novel systematic methodology to analyse the capabilities and limitations of Large Language Models (LLMs) with feedback from a formal inference engine, on logic theory induction. The analysis is complexity-graded w.r.t. rule dependency structure, allowing quantification of specific inference challenges on LLM performance. Integrating LLMs with formal methods is a promising frontier in the Natural Language Processing field, as an important avenue for improving model inference control and explainability. In particular, inductive learning over complex sets of facts and rules, poses unique challenges for current autoregressive models, as they lack explicit symbolic grounding. While they can be complemented by formal systems, the properties delivered by LLMs regarding inductive learning, are not well understood and quantified. Empirical results indicate that the largest LLMs can achieve competitive results against a SOTA Inductive Logic Programming (ILP) system baseline, but also that tracking long predicate relationship chains is a more difficult obstacle than theory complexity for the LLMs.
翻译:本研究提出了一种新颖的系统性方法,用于分析大型语言模型在逻辑理论归纳任务中的能力与局限性,该方法通过形式化推理引擎提供反馈。该分析针对规则依赖结构进行复杂程度分级,从而能够量化特定推理挑战对LLM性能的影响。将LLM与形式化方法相结合是自然语言处理领域一个前景广阔的前沿方向,为提升模型推理控制与可解释性提供了重要途径。特别是在复杂事实与规则集合上的归纳学习,对当前自回归模型提出了独特挑战,因为它们缺乏显式的符号基础。虽然形式化系统可以对此进行补充,但LLM在归纳学习方面表现出的特性尚未得到充分理解和量化。实证结果表明,最大规模的LLM能够达到与最先进的归纳逻辑编程系统基准相竞争的水平,但同时发现,对长谓词关系链的追踪相比理论复杂度对LLM而言是更为困难的障碍。