We examine two in context learning (ICL) tasks with mathematical functions in several train and test settings for transformer models. Our study generalizes work on linear functions by showing that small transformers, even models with attention layers only, can approximate arbitrary polynomial functions and hence continuous functions under certain conditions. Our models also can approximate previously unseen classes of polynomial functions, as well as the zeros of complex functions. Our models perform far better on this task than LLMs like GPT4 and involve complex reasoning when provided with suitable training data and methods. Our models also have important limitations; they fail to generalize outside of training distributions and so don't learn class forms of functions. We explain why this is so.
翻译:本研究在多种训练与测试设置下,针对Transformer模型考察了涉及数学函数的两类上下文学习任务。通过将线性函数相关研究推广至更一般情形,我们证明了小型Transformer模型(甚至仅含注意力层的模型)在特定条件下能够逼近任意多项式函数,进而逼近连续函数。我们的模型还能够逼近先前未见的多项式函数类别,以及复杂函数的零点。在此任务上,我们的模型表现远超GPT-4等大型语言模型,并在获得适当训练数据与方法时展现出复杂推理能力。同时,我们的模型存在重要局限:无法泛化至训练分布之外,因而无法学习函数的类别形式。本文对此现象给出了理论解释。