In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks. However, our understanding of how ICL works remains limited. We explore a simple model of ICL in a controlled setup with synthetic training data to investigate ICL of univariate linear functions. We experiment with a range of GPT-2-like transformer models trained from scratch. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to learn a linear function in-context. These models fail to generalize beyond their training distribution, highlighting fundamental limitations in their capacity to infer abstract task structures. Our experiments lead us to propose a mathematically precise hypothesis of what the model might be learning.
翻译:上下文学习(ICL)已成为一种强大的范式,能够轻松地将大型语言模型(LLMs)适配到各种任务中。然而,我们对ICL工作原理的理解仍然有限。我们在受控环境中使用合成训练数据探索了一个简单的ICL模型,以研究单变量线性函数的上下文学习。我们实验了一系列从头开始训练的类GPT-2 Transformer模型。我们的发现挑战了当前的主流观点,即Transformer采用类似线性回归的算法方法来在上下文中学习线性函数。这些模型无法泛化到其训练分布之外,突显了它们在推断抽象任务结构能力上的根本局限。我们的实验促使我们提出了一个关于模型可能学习内容的数学精确假设。