We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical setting of floating-point numbers and soft-attention, characterizing them with a new temporal logic. This logic extends propositional logic with a counting global modality over the encoder input and a past modality over the decoder input. We also give an additional characterization of such transformers via a type of distributed automata, and show that our results are not limited to the specific choices in the architecture and can account for changes in, e.g., masking. Finally, we discuss encoder-decoder transformers in the autoregressive setting.
翻译:我们给出了编码器-解码器Transformer的一种新颖逻辑刻画,这是大语言模型的基础架构,也广泛应用于受益于交叉注意力的各类场景中。我们在浮点数和软注意力的实际设定下研究处理文本的此类Transformer,并用一种新的时序逻辑对其进行刻画。该逻辑在命题逻辑的基础上扩展了编码器输入上的计数全局模态算子以及解码器输入上的过去模态算子。我们还通过一类分布式自动机给出了此类Transformer的另一种刻画,并证明我们的结果不局限于架构中的特定选择,能够适应例如掩码等变化。最后,我们讨论了自回归设定下的编码器-解码器Transformer。