We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions they compute are also computed by the same class of circuit families. Our results hold for transformers over the reals, rationals and any ring in between the two.
翻译:本文分析了作为向量序列到序列函数的Transformer编码器的计算能力。我们证明,若将算术电路作为编码器输入,则可利用平均硬注意力模拟此类电路。通过此方式可模拟的电路族具有常数深度,并使用无限制加法、二进制乘法及符号门。我们采用的Transformer以算术电路替代前馈网络。在典型平均注意力机制下,这些Transformer所计算的函数亦可由同类电路族计算。该结果适用于实数域、有理数域及介于两者之间的任意环上的Transformer。