Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for letter-bounded languages). While we show this is not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theory by training transformers for languages requiring complex (non-linear) arithmetic reasoning.
翻译:硬注意力思维链(CoT)Transformer 已被证明是图灵完备的。然而,软注意力思维链(CoT)Transformer 是否具备图灵完备性仍是一个开放性问题。本文中,我们证明了一个更强的结论:具备长度泛化能力的软注意力 CoT Transformer 是图灵完备的。具体而言,我们的图灵完备性证明通过计数 RASP(C-RASP)的 CoT 扩展实现,该扩展对应于允许长度泛化的软注意力 CoT Transformer。我们证明了在单字母表(更一般地,字母有界语言)上具有因果掩码的 CoT C-RASP 是图灵完备的。虽然我们表明该模型对于任意语言并非图灵完备,但我们证明了其结合相对位置编码的扩展版本对于任意语言是图灵完备的。我们通过训练 Transformer 处理需要复杂(非线性)算术推理的语言,对理论进行了实证验证。