Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation

Few-shot fine-tuning and in-context learning are two alternative strategies for task adaptation of pre-trained language models. Recently, in-context learning has gained popularity over fine-tuning due to its simplicity and improved out-of-domain generalization, and because extensive evidence shows that fine-tuned models pick up on spurious correlations. Unfortunately, previous comparisons of the two approaches were done using models of different sizes. This raises the question of whether the observed weaker out-of-domain generalization of fine-tuned models is an inherent property of fine-tuning or a limitation of the experimental setup. In this paper, we compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets, while controlling for the models used, the number of examples, and the number of parameters, ranging from 125M to 30B. Our results show that fine-tuned language models can in fact generalize well out-of-domain. We find that both approaches generalize similarly; they exhibit large variation and depend on properties such as model size and the number of examples, highlighting that robust task adaptation remains a challenge.

翻译：少样本微调和上下文学习是预训练语言模型任务适配的两种替代策略。近期，上下文学习因操作简便且能提升域外泛化能力而比微调更受青睐，同时大量证据表明微调模型会学习虚假关联。遗憾的是，此前对两种方法的比较均基于不同规模的模型。这引发了一个问题：观察到的微调模型域外泛化能力较弱，究竟是微调方法的内在属性，还是实验设计的局限性？本文在控制模型架构、样本数量和参数规模（从1.25亿到300亿参数）的条件下，比较了少样本微调与上下文学习在挑战性数据集上的泛化表现。结果表明，微调语言模型实际上能实现良好的域外泛化。我们发现两种方法的泛化表现相似，均呈现较大差异，且依赖于模型规模和样本数量等特性，这凸显了稳健的任务适配仍是一项挑战。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/