One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be equivalently expressed in a generalization of first-order logic. However, finite-precision transformers are a weak transformer variant because, as we show, a single head can only attend to a constant number of tokens and, in particular, cannot represent uniform attention. Since attending broadly is a core capability for transformers, we ask whether a minimally more expressive model that can attend universally can also be characterized in logic. To this end, we analyze transformers whose forward pass is computed in $\log n$ precision on contexts of length $n$. We prove that any log-precision transformer can be equivalently expressed as a first-order logic sentence that, in addition to standard universal and existential quantifiers, may also contain majority-vote quantifiers. This is the tightest known upper bound and first logical characterization of log-precision transformers.
翻译:解读基于Transformer的语言模型的推理能力的一种方式,是描述它们能在输入文本中解决的逻辑规则类型。近期,Chiang等人(2023)证明,有限精度Transformer可以等价地表达为一种泛化的一阶逻辑。然而,有限精度Transformer是一种弱化的Transformer变体,因为正如我们所展示的,单个注意力头只能关注固定数量的令牌,尤其无法表示均匀注意力。鉴于广泛关注是Transformer的核心能力,我们提出疑问:一种能够普遍关注的最小表达力更强的模型是否也能用逻辑刻画?为此,我们分析了其前向传播在长度为$n$的上下文中以$\log n$精度计算的Transformer。我们证明,任何对数精度Transformer都可以等价地表达为一个一阶逻辑句子,该句子除了标准全称量词和存在量词外,还可包含多数投票量词。这是目前已知最紧的上界,也是对数精度Transformer的首个逻辑刻画。