The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that they are equivalent to linear temporal logic (LTL), which defines exactly the star-free languages. A key technique is the use of Boolean RASP as a convenient intermediate language between transformers and LTL. We then take numerous results known for LTL and apply them to transformers, characterizing how position embeddings, strict masking, and depth increase expressive power.
翻译:通过识别形式语言类的能力,可以研究Transformer在无界输入长度上的表达能力。本文考察采用硬注意力(所有注意力精确聚焦于单一位置)和严格未来掩码(每个位置仅关注其严格左侧位置)的Transformer编码器,证明其等价于线性时序逻辑(LTL),而LTL恰好定义了星自由语言。关键技术是使用布尔RASP作为Transformer与LTL间的便捷中间语言。随后将LTL领域的诸多已知结论应用于Transformer,系统阐明了位置编码、严格掩码机制和网络深度对表达能力的增强作用。