Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering the theories underpinning ICL. One popular hypothesis explains ICL by task selection. LLMs identify the task based on the demonstration and generalize it to the prompt. Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration. Finally, a third hypothesis argues that LLMs use the demonstration to select a composition of tasks learned during pre-training to perform ICL. In this paper, we empirically explore these three hypotheses that explain LLMs' ability to learn in context with a suite of experiments derived from common text classification tasks. We invalidate the first two hypotheses with counterexamples and provide evidence in support of the last hypothesis. Our results suggest an LLM could learn a novel task in context via composing tasks learned during pre-training.
翻译:大型语言模型(LLM)展现出一种引人注目的能力:能够通过演示中呈现的上下文示例学习新任务,这被称为上下文学习(ICL)。可以理解的是,大量研究致力于揭示支撑ICL的理论。一种流行的假说通过任务选择来解释ICL:LLM根据演示识别任务,并将其推广到提示。另一种流行的假说认为ICL是元学习的一种形式,即模型在预训练时学习一种学习算法,并将其应用于演示。最后,第三种假说认为,LLM利用演示来选择在预训练期间学习到的任务组合以执行ICL。在本文中,我们通过一系列源自常见文本分类任务的实验,对这三种解释LLM上下文学习能力的假说进行了实证探索。我们通过反例否定了前两种假说,并为最后一种假说提供了证据。我们的结果表明,LLM可能通过组合在预训练期间学习的任务,在上下文中学习新任务。