This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic relations between datatypes and emulating program behavior. We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned models' behavior. The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics} -- one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture. With our framework as a powerful conceptual tool, we identify different issues under various set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art large language models (LLMs) to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise computation) of the recursive function.
翻译:本文研究了基于Transformer的模型从示例中学习结构递归的能力。递归是自然语言和形式语言中的普适概念。结构递归在编程语言和形式数学任务中至关重要,这些任务中符号工具目前优于神经模型,例如推断数据类型间的语义关系和模拟程序行为。我们引入了一个通用框架,巧妙地将编程语言领域中结构递归的抽象概念与具体的序列建模问题及学习模型的行为联系起来。该框架包含一种表示,能够捕捉结构递归的一般语法句法,并结合两种不同的框架来理解其语义:一种更符合编程语言视角,另一种则有助于在该视角与底层Transformer架构的机制理解之间建立桥梁。借助这一强大的概念工具,我们发现了不同设定下的多种问题。经过训练以模拟递归计算的模型无法完全捕捉递归本质,而是拟合了捷径算法,因此无法解决训练分布中代表性不足的某些边界情况。此外,最先进的大语言模型难以从上下文示例中挖掘递归规则。同时,这些大语言模型在模拟递归函数的规约(逐步计算)时,以有趣的方式出现失败。