Deriving formal bounds on the expressivity of transformers, as well as studying transformers that are constructed to implement known algorithms, are both effective methods for better understanding the computational power of transformers. Towards both ends, we introduce the temporal counting logic $\textbf{K}_\text{t}$[#] alongside the RASP variant $\textbf{C-RASP}$. We show they are equivalent to each other, and that together they are the best-known lower bound on the formal expressivity of future-masked soft attention transformers with unbounded input size. We prove this by showing all $\textbf{K}_\text{t}$[#] formulas can be compiled into these transformers. As a case study, we demonstrate on paper how to use $\textbf{C-RASP}$ to construct simple transformer language models that, using greedy decoding, can only generate sentences that have given properties formally specified in $\textbf{K}_\text{t}$[#].
翻译:推导Transformer表达能力的正式界限,以及研究为执行已知算法而构建的Transformer,都是理解Transformer计算能力的有效方法。为此,我们引入了时序计数逻辑$\textbf{K}_\text{t}$[#]及其对应的RASP变体$\textbf{C-RASP}$。我们证明二者等价,并且它们共同构成了在输入规模无界的未来掩码软注意力Transformer中,目前已知的最好正式表达能力下界。我们通过证明所有$\textbf{K}_\text{t}$[#]公式均可编译为这类Transformer来验证这一结论。作为案例研究,我们展示了如何利用$\textbf{C-RASP}$构建简单的Transformer语言模型,这些模型在使用贪心解码时,仅能生成满足$\textbf{K}_\text{t}$[#]中正式指定属性的句子。