The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. In this paper, we establish exact characterizations of transformers with hard attention (in which all attention is focused on exactly one position) and attention masking (in which each position only attends to positions on one side). With strict masking (each position cannot attend to itself) and without position embeddings, these transformers are expressively equivalent to linear temporal logic (LTL), which defines exactly the star-free languages. A key technique is the use of Boolean RASP as a convenient intermediate language between transformers and LTL. We then take numerous results known for LTL and apply them to transformers, showing how position embeddings, strict masking, and depth all increase expressive power.
翻译:通过研究Transformer在无界长度输入上识别形式语言类的能力,可以探究其表达能力。本文建立了具有硬注意力(所有注意力精确聚焦于单一位置)与注意力掩码(每个位置仅能关注单侧位置)的Transformer的精确表达能力刻画。在使用严格掩码(每个位置不能关注自身)且无位置编码的情况下,这类Transformer在表达能力上等价于线性时序逻辑(LTL),而LTL恰好定义了星自由语言。关键技术在于采用布尔RASP作为Transformer与LTL之间便捷的中间语言。随后,我们将已知的众多LTL性质应用于Transformer,证明了位置编码、严格掩码以及网络深度均能提升其表达能力。