In-context learning (ICL) using large language models for tasks with many labels is challenging due to the limited context window, which makes it difficult to fit a sufficient number of examples in the prompt. In this paper, we use a pre-trained dense retrieval model to bypass this limitation, giving the model only a partial view of the full label space for each inference call. Testing with recent open-source LLMs (OPT, LLaMA), we set new state of the art performance in few-shot settings for three common intent classification datasets, with no finetuning. We also surpass fine-tuned performance on fine-grained sentiment classification in certain cases. We analyze the performance across number of in-context examples and different model scales, showing that larger models are necessary to effectively and consistently make use of larger context lengths for ICL. By running several ablations, we analyze the model's use of: a) the similarity of the in-context examples to the current input, b) the semantic content of the class names, and c) the correct correspondence between examples and labels. We demonstrate that all three are needed to varying degrees depending on the domain, contrary to certain recent works.
翻译:基于大语言模型的上下文学习(ICL)在处理多标签任务时面临挑战,由于有限的上下文窗口难以在提示中容纳足够数量的示例。本文采用预训练稠密检索模型突破这一局限,每次推理时仅向模型提供完整标签空间的部分视图。通过使用最新的开源大语言模型(OPT、LLaMA)进行测试,我们在三个常见的意图分类数据集的少样本场景中取得了新的最佳性能,且无需微调。在某些细粒度情感分类任务中,我们的方法甚至超越了微调模型的性能。我们分析了不同上下文示例数量及不同模型规模下的性能表现,发现更大规模的模型对于有效且稳定地利用更长上下文长度进行ICL至关重要。通过多项消融实验,我们剖析了模型对以下三个因素的利用程度:a) 上下文示例与当前输入的相似性,b) 类别名称的语义内容,c) 示例与标签之间的正确对应关系。实验表明,这三项因素在不同领域中的重要性存在差异,这与近期某些研究结论相反。