We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.
翻译:我们考虑具有硬注意力(所有注意力集中于单一位置)和严格未来掩蔽(每个位置仅关注其左侧位置)的变换器编码器,并证明这些网络识别的语言类恰好是星自由语言。加入位置嵌入后,识别语言类扩展至其他深入研究的类别。这些证明的核心技术是布尔RASP——一种限定于布尔值的RASP变体。通过星自由语言,我们将变换器与一阶逻辑、时态逻辑及代数自动机理论相关联。