Recent developments in large pre-trained language models have enabled unprecedented performance on a variety of downstream tasks. Achieving best performance with these models often leverages in-context learning, where a model performs a (possibly new) task given one or more examples. However, recent work has shown that the choice of examples can have a large impact on task performance and that finding an optimal set of examples is non-trivial. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process and train an example retriever using reinforcement learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches heuristic and learnable baselines. We also use case studies to show that RetICL implicitly learns representations of problem solving strategies.
翻译:大型预训练语言模型的最新发展已在多种下游任务中实现了前所未有的性能。要凭借这些模型获得最佳性能,通常需要利用上下文学习,即模型根据一个或多个示例来执行(可能是新的)任务。然而,近期研究表明,示例的选择会对任务性能产生显著影响,且找到最优示例集并非易事。现有的上下文示例选取方法大多独立评分每个示例,忽略了示例间的依赖关系及其在模型中呈现的顺序。本文提出了一种可学习的上下文学习检索方法(RetICL),通过顺序建模与优化示例选取来实现上下文学习。我们将示例顺序选取问题建模为马尔可夫决策过程,并利用强化学习训练示例检索器。在数学应用题求解与科学问答任务上的评估表明,RetICL 持续优于或持平于启发式与可学习基线方法。案例研究进一步揭示,RetICL 能隐式学习问题解决策略的表示。