The predictions of Large Language Models (LLMs) on downstream tasks often improve significantly when including examples of the input--label relationship in the context. However, there is currently no consensus about how this in-context learning (ICL) ability of LLMs works. For example, while Xie et al. (2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022) argue ICL does not even learn label relationships from in-context examples. In this paper, we provide novel insights into how ICL leverages label information, revealing both capabilities and limitations. To ensure we obtain a comprehensive picture of ICL behavior, we study probabilistic aspects of ICL predictions and thoroughly examine the dynamics of ICL as more examples are provided. Our experiments show that ICL predictions almost always depend on in-context labels and that ICL can learn truly novel tasks in-context. However, we also find that ICL struggles to fully overcome prediction preferences acquired from pre-training data and, further, that ICL does not consider all in-context information equally.
翻译:大型语言模型(LLMs)在下游任务中的预测结果,在将输入-标签关系的示例纳入上下文后通常会显著提升。然而,目前对于LLMs如何实现这种上下文学习(ICL)能力尚无共识。例如,Xie等人(2021)将ICL类比为通用学习算法,而Min等人(2022)则主张ICL甚至无法从上下文示例中学习标签关系。本文提供了关于ICL如何利用标签信息的新见解,揭示了其能力与局限。为确保全面理解ICL行为,我们研究了ICL预测的概率特性,并深入考察了随着提供更多示例时ICL的动态变化。实验表明,ICL预测几乎总是依赖于上下文标签,且ICL能够在上下文中学习真正的新任务。然而,我们也发现ICL难以完全克服从预训练数据中获得的预测偏好,并且ICL并不能平等地处理所有上下文信息。