The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be interpreted as eliciting a kind of in-context learning. We suggest that this perspective helps to unify the broad set of in-context abilities that language models exhibit $\unicode{x2014}$ such as adapting to tasks from instructions or role play, or extrapolating time series. This perspective also sheds light on potential roots of in-context learning in lower-level processing of linguistic dependencies (e.g. coreference or parallel structures). Finally, taking this perspective highlights the importance of generalization, which we suggest can be studied along several dimensions: not only the ability to learn something novel, but also flexibility in learning from different presentations, and in applying what is learned. We discuss broader connections to past literature in meta-learning and goal-conditioned agents, and other perspectives on learning and adaptation. We close by suggesting that research on in-context learning should consider this broader spectrum of in-context capabilities and types of generalization.
翻译:语言模型通过少量上下文示例学习任务的能力引发了广泛关注。本文提出一种视角,将这种监督式少样本学习置于更广阔的元学习上下文学习谱系中进行定位。我们认为,任何能够通过上下文显著降低后续预测损失的序列分布,都可以被理解为引发了一种上下文学习。这一视角有助于统一语言模型展现的多样化上下文能力——例如通过指令或角色扮演适应任务,或进行时间序列外推。该视角同时揭示了上下文学习在语言依赖关系底层处理(如共指或平行结构)中的潜在根源。此外,这一视角凸显了泛化能力的重要性,我们认为可以从多个维度进行研究:不仅是学习新事物的能力,还包括从不同呈现方式中学习的灵活性,以及应用所学知识的能力。我们讨论了与元学习、目标条件智能体等历史文献的广泛联系,以及其他学习与适应视角。最后我们建议,上下文学习的研究应当考虑这种更广阔的上下文能力谱系和泛化类型。