In-context learning is a surprising and important phenomenon that emerged when modern language models were scaled to billions of learned parameters. Without modifying a large language model's weights, it can be tuned to perform various downstream natural language tasks simply by including concatenated training examples of these tasks in its input. Though disruptive for many practical applications of large language models, this emergent learning paradigm is not well understood from a theoretical perspective. In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup. Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution, and then a second in-context learning phase, which keeps this function constant and concatenates training examples of the downstream task in its input. We use our framework in order to prove that, under mild assumptions, when the pretraining distribution is a mixture of latent tasks (a model often considered for natural language pretraining), these tasks can be efficiently learned via in-context learning, even though the model's weights are unchanged and the input significantly diverges from the pretraining distribution. Our theoretical analysis reveals that in this setting, in-context learning is more about identifying the task than about learning it, a result which is in line with a series of recent empirical findings. We hope that the in-context learnability framework presented in this paper will facilitate future progress towards a deeper understanding of this important new learning paradigm.
翻译:上下文学习是一个令人惊讶且重要的现象,它在现代语言模型被扩展到数十亿个学习参数时出现。在不修改大型语言模型权重的情况下,仅通过在其输入中包含串联的下游任务训练示例,即可调整模型以执行各种自然语言任务。尽管这对大型语言模型的许多实际应用产生了颠覆性影响,但这种新兴的学习范式从理论角度尚未得到充分理解。在本文中,我们提出了首个基于PAC的上下文学习可学习性框架,并利用该框架提供了上下文学习设置中的首批有限样本复杂度结果。我们的框架包括一个初始预训练阶段,该阶段将函数拟合到预训练分布,随后是一个上下文学习阶段,该阶段保持函数不变,并在其输入中串联下游任务的训练示例。我们使用该框架证明,在温和假设下,当预训练分布是潜在任务的混合(通常被认为是自然语言预训练的一种模型)时,这些任务可以通过上下文学习得到高效学习,即使模型权重保持不变且输入与预训练分布显著不同。我们的理论分析表明,在此设置中,上下文学习更多是关于识别任务而非学习任务,这一结果与一系列近期实证研究相符。我们希望本文提出的上下文学习可学习性框架能为深入理解这一重要的新学习范式提供未来进展的助力。