In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels. First, we characterize the feature biases of GPT-3 models by constructing underspecified demonstrations from a range of NLP datasets and feature combinations. We find that LLMs exhibit clear feature biases - for example, demonstrating a strong bias to predict labels according to sentiment rather than shallow lexical features, like punctuation. Second, we evaluate the effect of different interventions that are designed to impose an inductive bias in favor of a particular feature, such as adding a natural language instruction or using semantically relevant label words. We find that, while many interventions can influence the learner to prefer a particular feature, it can be difficult to overcome strong prior biases. Overall, our results provide a broader picture of the types of features that ICL may be more likely to exploit and how to impose inductive biases that are better aligned with the intended task.
翻译:上下文学习(ICL)是大语言模型(LLMs)适配新任务的重要范式,但ICL的泛化行为仍未被充分理解。我们从特征偏置的角度研究ICL的归纳偏置:在给定一组两个特征对标签具有同等预测能力的欠指定示例时,ICL更倾向于使用哪个特征。首先,我们通过构造来自不同自然语言处理数据集和特征组合的欠指定示例,刻画了GPT-3模型的特征偏置。研究发现,LLMs表现出明确特征偏置——例如,相较于标点符号等浅层词汇特征,模型更倾向于根据情感预测标签。其次,我们评估了旨在引导模型偏好特定特征的不同干预措施的效果,例如添加自然语言指令或使用语义相关的标签词。结果表明,尽管多种干预措施能影响学习器偏好特定特征,但克服先验强偏置仍具挑战。总体而言,我们的研究为理解ICL更可能利用的特征类型以及如何施加与目标任务更一致的归纳偏置提供了全景视角。