In recent years, pre-trained large language models have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. The underlying mechanisms by which this capability arises from regular language model pretraining objectives remain poorly understood. In this study, we aim to examine the in-context learning phenomenon through a Bayesian lens, viewing large language models as topic models that implicitly infer task-related information from demonstrations. On this premise, we propose an algorithm for selecting optimal demonstrations from a set of annotated data and demonstrate a significant 12.5% improvement relative to the random selection baseline, averaged over eight GPT2 and GPT3 models on eight different real-world text classification datasets. Our empirical findings support our hypothesis that large language models implicitly infer a latent concept variable.
翻译:近年来,预训练的大型语言模型在推理阶段展现出高效的小样本学习能力,即情境学习。然而,现有文献指出,这种能力对少量示例的选择高度敏感,而其源于常规语言模型预训练目标的潜在机制仍未被充分理解。本研究旨在通过贝叶斯视角审视情境学习现象,将大型语言模型视为从示例中隐式推断任务相关信息的主题模型。基于此前提,我们提出了一种从标注数据集中选择最优示例的算法,并在八个GPT2和GPT3模型、八种不同真实文本分类数据集上,相较于随机选择基线实现了平均12.5%的显著提升。我们的实证结果支持了"大型语言模型隐式推断潜在概念变量"这一假设。