This article presents a theoretical evaluation of the computational universality of decoder-only transformer models. We extend the theoretical literature on transformer models and show that decoder-only transformer architectures (even with only a single layer and single attention head) are Turing complete under reasonable assumptions. From the theoretical analysis, we show sparsity/compressibility of the word embedding to be a necessary condition for Turing completeness to hold.
翻译:本文对解码器专用Transformer模型的计算完备性进行了理论评估。我们拓展了Transformer模型的理论文献,证明在合理假设下,解码器专用Transformer架构(即使仅包含单层和单注意力头)具有图灵完备性。通过理论分析,我们证明词嵌入的稀疏性/可压缩性是实现图灵完备性的必要条件。