Counting properties (e.g. determining whether certain tokens occur more than other tokens in a given input text) have played a significant role in the study of expressiveness of transformers. In this paper, we provide a formal framework for investigating the counting power of transformers. We argue that all existing results demonstrate transformers' expressivity only for (semi-)linear counting properties, i.e., which are expressible as a boolean combination of linear inequalities. Our main result is that transformers can express counting properties that are highly nonlinear. More precisely, we prove that transformers can capture all semialgebraic counting properties, i.e., expressible as a boolean combination of arbitrary multivariate polynomials (of any degree). Among others, these generalize the counting properties that can be captured by C-RASP softmax transformers, which capture only linear counting properties. To complement this result, we exhibit a natural subclass of (softmax) transformers that completely characterizes semialgebraic counting properties. Through connections with the Hilbert's tenth problem, this expressivity of transformers also yields a new undecidability result for analyzing an extremely simple transformer model -- surprisingly with neither positional encodings (i.e. NoPE-transformers) nor masking. We also experimentally validate trainability of such counting properties.
翻译:计数性质(例如判断给定输入文本中某些标记是否比其他标记出现得更频繁)在Transformer表达能力的研究中扮演着重要角色。本文提出了一个用于研究Transformer计数能力的正式框架。我们认为,现有所有结果仅证明了Transformer对(半)线性计数性质的表达能力,即那些可表示为线性不等式布尔组合的性质。我们的主要结果表明,Transformer能够表达高度非线性的计数性质。更精确地说,我们证明了Transformer可以捕获所有半代数计数性质,即可表示为任意多元多项式(任意次数)布尔组合的性质。特别地,这些性质推广了C-RASP softmax Transformer所能捕获的计数性质——后者仅能捕获线性计数性质。作为该结果的补充,我们展示了一个(softmax)Transformer的自然子类,它完全刻画了半代数计数性质。通过与希尔伯特第十问题的联系,Transformer的这种表达能力还导出了一个关于分析极简Transformer模型的新不可判定性结果——令人惊讶的是,该模型既不需要位置编码(即NoPE-Transformer)也不需要掩码机制。我们还通过实验验证了此类计数性质的可训练性。