Large language models (LLMs) have demonstrated their ability to learn in-context, allowing them to perform various tasks based on a few input-output examples. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples. In this paper, we propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. Our framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever. Our experiments on a suite of 30 tasks demonstrate that our framework significantly enhances in-context learning performance. Furthermore, we show the generalization ability of our framework to unseen tasks during training. An in-depth analysis reveals that our model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes.
翻译:大语言模型(LLMs)已展现出其在上下文中学习的能力,使其能够基于少量输入-输出示例执行多种任务。然而,上下文学习的有效性在很大程度上依赖于所选示例的质量。本文提出了一种新颖的框架,通过迭代训练密集检索器来为LLMs识别高质量上下文示例。该框架首先基于LLM反馈训练奖励模型,以评估候选示例的质量,随后通过知识蒸馏训练基于双编码器的密集检索器。我们在包含30个任务的测试集上进行的实验表明,该框架显著提升了上下文学习的性能。此外,我们还展示了该框架对训练过程中未见任务的泛化能力。深入分析表明,我们的模型通过检索具有相似模式的示例来改善性能,且这种提升在不同规模的LLMs上均保持一致。