Large language models (LLMs) have demonstrated their ability to learn in-context, allowing them to perform various tasks based on a few input-output examples. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples. In this paper, we propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. Our framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever. Our experiments on a suite of $30$ tasks demonstrate that our framework significantly enhances in-context learning performance. Furthermore, we show the generalization ability of our framework to unseen tasks during training. An in-depth analysis reveals that our model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes. The code and data are available at https://github.com/microsoft/LMOps/tree/main/llm_retriever .
翻译:大型语言模型(LLMs)已展现出通过上下文学习的能力,使其能够基于少量输入输出示例执行多种任务。然而,上下文学习的有效性高度依赖于所选示例的质量。本文提出了一种新颖框架,通过迭代训练密集检索器,为大型语言模型识别高质量的上下文示例。该框架首先基于LLM反馈训练一个奖励模型,用于评估候选示例的质量;随后通过知识蒸馏训练一个基于双编码器的密集检索器。我们在包含30个任务的一组实验表明,该框架显著提升了上下文学习的性能。此外,我们展示了框架对训练中未见过任务的泛化能力。深入分析显示,我们的模型通过检索具有相似模式的示例来提升性能,且这种提升在不同规模的大型语言模型上保持稳定。相关代码与数据可从 https://github.com/microsoft/LMOps/tree/main/llm_retriever 获取。