In-context learning is a new learning paradigm where a language model conditions on a few input-output pairs (demonstrations) and a test input, and directly outputs the prediction. It has been shown highly dependent on the provided demonstrations and thus promotes the research of demonstration retrieval: given a test input, relevant examples are retrieved from the training set to serve as informative demonstrations for in-context learning. While previous works focus on training task-specific retrievers for several tasks separately, these methods are often hard to transfer and scale on various tasks, and separately trained retrievers incur a lot of parameter storage and deployment cost. In this paper, we propose Unified Demonstration Retriever (\textbf{UDR}), a single model to retrieve demonstrations for a wide range of tasks. To train UDR, we cast various tasks' training signals into a unified list-wise ranking formulation by language model's feedback. Then we propose a multi-task list-wise ranking training framework, with an iterative mining strategy to find high-quality candidates, which can help UDR fully incorporate various tasks' signals. Experiments on 30+ tasks across 13 task families and multiple data domains show that UDR significantly outperforms baselines. Further analyses show the effectiveness of each proposed component and UDR's strong ability in various scenarios including different LMs (1.3B - 175B), unseen datasets, varying demonstration quantities, etc.
翻译:上下文学习是一种新的学习范式,其中语言模型以少量输入-输出对(演示)和测试输入为条件,直接输出预测结果。研究表明,该范式高度依赖于所提供的演示,从而推动了演示检索的研究:给定测试输入,从训练集中检索相关示例作为上下文学习的有效演示。以往的工作侧重于为若干任务分别训练任务特定的检索器,这些方法通常难以迁移和扩展到各种任务,且分别训练的检索器会带来大量的参数存储和部署成本。在本文中,我们提出了统一演示检索器(UDR),这是一个能够为广泛任务检索演示的单一模型。为训练UDR,我们通过语言模型的反馈将各种任务的训练信号转化为统一的列表排序公式。随后,我们提出了一种多任务列表排序训练框架,并采用迭代挖掘策略来查找高质量候选,这有助于UDR充分整合各种任务的信号。在涵盖13个任务族和多个数据领域的30多个任务上的实验表明,UDR显著优于基线方法。进一步分析展示了每个提出组件的有效性,以及UDR在不同场景(包括不同语言模型(1.3B-175B)、未见数据集、不同演示数量等)中的强大能力。