This article presents a theoretical evaluation of the computational universality of decoder-only transformer models. We extend the theoretical literature on transformer models and show that decoder-only transformer architectures (even with only a single layer and single attention head) are Turing complete under reasonable assumptions. From the theoretical analysis, we show sparsity/compressibility of the word embedding to be a necessary condition for Turing completeness to hold.
翻译:本文对仅解码器Transformer模型的计算通用性进行了理论评估。我们扩展了关于Transformer模型的理论文献,并证明在合理假设下,仅解码器Transformer架构(即使仅包含单层和单注意力头)具有图灵完备性。通过理论分析,我们揭示词嵌入的稀疏性/可压缩性是实现图灵完备性的必要条件。