In-context learning (ICL), the ability of large language models to perform novel tasks by conditioning on a prompt with a few task examples, requires these examples to be informative about the test instance. The standard approach of independently ranking and selecting the most similar examples selects redundant examples while omitting important information. In this work, we show that BERTScore-Recall (BSR) selects better examples that demonstrate more of the salient aspects, e.g. reasoning patterns, of the test input. We further extend BSR and many standard metrics to easily optimizable set-level metrics, giving still better coverage of those salient aspects. On 15 datasets spanning 6 tasks and with 7 diverse LLMs, we show that (1) BSR is the superior metric for in-context example selection across the board, and (2) for compositional tasks, set selection using Set-BSR outperforms independent ranking by up to 17 points on average and, despite being training-free, surpasses methods that leverage task or LLM-specific training.
翻译:上下文学习(ICL)是指大型语言模型通过包含少量任务示例的提示进行条件化处理,从而执行新任务的能力,这要求这些示例能够提供有关测试实例的信息。独立排序和选择最相似示例的标准方法会选择冗余示例,同时遗漏重要信息。在本工作中,我们展示了BERTScore-Recall(BSR)能够选择更好的示例,这些示例能更充分地展示测试输入中的显著方面,如推理模式。我们进一步将BSR及许多标准指标扩展到易于优化的集合级指标,从而更好地覆盖这些显著方面。在涵盖6项任务的15个数据集上,使用7种不同的大型语言模型,我们表明:(1)BSR是跨领域上下文示例选择的优越指标;(2)对于组合型任务,使用Set-BSR的集合选择平均比独立排序高出17个百分点,并且尽管无需训练,其表现优于利用任务或特定语言模型训练的方法。