Length generalization is a key property of a learning algorithm that enables it to make correct predictions on inputs of any length, given finite training data. To provide such a guarantee, one needs to be able to compute a length generalization bound, beyond which the model is guaranteed to generalize. This paper concerns the open problem of the computability of such generalization bounds for C-RASP, a class of languages which is closely linked to transformers. A positive partial result was recently shown by Chen et al. for C-RASP with only one layer and, under some restrictions, also with two layers. We provide complete answers to the above open problem. Our main result is the non-existence of computable length generalization bounds for C-RASP (already with two layers) and hence for transformers. To complement this, we provide a computable bound for the positive fragment of C-RASP, which we show equivalent to fixed-precision transformers. For both positive C-RASP and fixed-precision transformers, we show that the length complexity is exponential, and prove optimality of the bounds.
翻译:长度泛化是学习算法的一项关键性质,它使得算法在给定有限训练数据的情况下,能够对任意长度的输入做出正确预测。为了提供这样的保证,需要能够计算出长度泛化界限,超出此界限后模型保证能够泛化。本文关注C-RASP(一种与Transformer紧密相关的语言类)此类泛化界限可计算性的开放问题。Chen等人最近针对仅含一层的C-RASP展示了一个部分正面结果,并在某些限制下,针对两层C-RASP也给出了类似结果。我们为上述开放问题提供了完整解答。主要结果是:C-RASP(即便仅有两层)不存在可计算的长度泛化界限,因此Transformer也不存在。作为补充,我们为C-RASP的正面片段提供了一个可计算界限,并证明该片段等价于固定精度Transformer。对于正面C-RASP和固定精度Transformer,我们均证明其长度复杂度为指数级,并证明了这些界限的最优性。