We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.
翻译:我们研究具有硬注意力(所有注意力完全集中于单一位置)和严格未来掩蔽(每个位置仅关注其左侧位置)的变换器编码器,并证明这类网络可识别的语言类别恰好是无星语言。添加位置嵌入可将可识别语言类别扩展至其他已被充分研究的类别。这些证明的关键技术是布尔RASP,一种局限于布尔值的RASP变体。通过无星语言,我们将变换器与一阶逻辑、时态逻辑及代数自动机理论联系起来。