Empirical studies have identified a range of learnability biases and limitations of transformers, such as a persistent difficulty in learning to compute simple formal languages such as PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing expressiveness theory either overpredicting or underpredicting realistic learning abilities. We prove that, under the transformer architecture, the loss landscape is constrained by the input-space sensitivity: Transformers whose output is sensitive to many parts of the input string inhabit isolated points in parameter space, leading to a low-sensitivity bias in generalization. We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY. This shows that understanding transformers' inductive biases requires studying not just their in-principle expressivity, but also their loss landscape.
翻译:实证研究发现,Transformer存在一系列可学习性偏差和局限性,例如在学习计算简单形式语言(如PARITY)时持续存在困难,以及偏向于低度函数。然而,理论理解仍然有限,现有的表达能力理论要么高估、要么低估了实际的学习能力。我们证明,在Transformer架构下,损失景观受输入空间敏感性的约束:如果Transformer的输出对输入字符串的多个部分敏感,则在参数空间中占据孤立点,导致泛化时存在低敏感性偏差。我们通过理论和实验表明,这一理论统一了关于Transformer学习能力与偏差的大量实证观察,例如其泛化偏向于低敏感性和低度性,以及在PARITY长度泛化上的困难。这表明,理解Transformer的归纳偏差不仅需要研究其原则上表达能力,还需要研究其损失景观。