This work presents a novel systematic methodology to analyse the capabilities and limitations of Large Language Models (LLMs) with feedback from a formal inference engine, on logic theory induction. The analysis is complexity-graded w.r.t. rule dependency structure, allowing quantification of specific inference challenges on LLM performance. Integrating LLMs with formal methods is a promising frontier in the Natural Language Processing field, as an important avenue for improving model inference control and explainability. In particular, inductive learning over complex sets of facts and rules, poses unique challenges for current autoregressive models, as they lack explicit symbolic grounding. While they can be complemented by formal systems, the properties delivered by LLMs regarding inductive learning, are not well understood and quantified. Empirical results indicate that the largest LLMs can achieve competitive results against a SOTA Inductive Logic Programming (ILP) system baseline, but also that tracking long predicate relationship chains is a more difficult obstacle than theory complexity for LLMs.
翻译:本研究提出了一种新颖的系统性方法,用于分析大型语言模型(LLMs)在逻辑理论归纳任务中的能力与局限,该方法结合了形式化推理引擎的反馈。该分析针对规则依赖结构进行了复杂度分级,从而能够量化特定推理挑战对LLM性能的影响。将LLMs与形式化方法相结合,作为提升模型推理控制力和可解释性的重要途径,是自然语言处理领域一个前景广阔的前沿方向。特别是,针对复杂事实与规则集合的归纳学习,对当前的自回归模型提出了独特挑战,因为它们缺乏显式的符号基础。虽然可以通过形式化系统对其进行补充,但LLMs在归纳学习方面所展现的特性尚未得到充分理解和量化。实证结果表明,最大规模的LLMs能够取得与最先进的归纳逻辑编程(ILP)系统基线相竞争的结果,但同时发现,对于LLMs而言,追踪长谓词关系链是比理论复杂度更为困难的障碍。