The predictions of Large Language Models (LLMs) on downstream tasks often improve significantly when including examples of the input--label relationship in the context. However, there is currently no consensus about how this in-context learning (ICL) ability of LLMs works. For example, while Xie et al. (2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022) argue ICL does not even learn label relationships from in-context examples. In this paper, we provide novel insights into how ICL leverages label information, revealing both capabilities and limitations. To ensure we obtain a comprehensive picture of ICL behavior, we study probabilistic aspects of ICL predictions and thoroughly examine the dynamics of ICL as more examples are provided. Our experiments show that ICL predictions almost always depend on in-context labels, and that ICL can learn truly novel tasks in-context. However, we also find that ICL struggles to fully overcome prediction preferences acquired from pre-training data, and, further, that ICL does not consider all in-context information equally.
翻译:大语言模型在下游任务中的预测能力,往往在上下文中包含输入-标签关系示例时显著提升。然而,目前学界对于这种情境学习机制尚未达成共识。例如,Xie等人(2021)将情境学习类比为通用学习算法,而Min等人(2022)则主张情境学习甚至无法从上下文示例中学习标签关系。本文深入揭示了情境学习如何利用标签信息,既展示了其能力也揭示了其局限性。为全面理解情境学习行为,我们研究了情境学习预测的概率特征,并细致考察了随示例增加时情境学习的动态变化。实验表明:情境学习的预测几乎始终依赖于上下文标签,且其确实能在上下文中学习全新任务。但我们也发现,情境学习难以完全克服预训练数据习得的预测偏好,并且无法平等对待所有上下文信息。