In-Context Learning (ICL) enables Large Language Models (LLMs) to perform new tasks by conditioning on prompts with relevant information. Retrieval-Augmented Generation (RAG) enhances ICL by incorporating retrieved documents into the LLM's context at query time. However, traditional retrieval methods focus on semantic relevance, treating retrieval as a search problem. In this paper, we propose reframing retrieval for ICL as a recommendation problem, aiming to select documents that maximize utility in ICL tasks. We introduce the In-Context Learning Embedding and Reranker Benchmark (ICLERB), a novel evaluation framework that compares retrievers based on their ability to enhance LLM accuracy in ICL settings. Additionally, we propose a novel Reinforcement Learning-to-Rank from AI Feedback (RLRAIF) algorithm, designed to fine-tune retrieval models using minimal feedback from the LLM. Our experimental results reveal notable differences between ICLERB and existing benchmarks, and demonstrate that small models fine-tuned with our RLRAIF algorithm outperform large state-of-the-art retrieval models. These findings highlight the limitations of existing evaluation methods and the need for specialized benchmarks and training strategies adapted to ICL.
翻译:上下文学习(ICL)使大型语言模型(LLM)能够通过基于包含相关信息的提示进行条件化来执行新任务。检索增强生成(RAG)通过在查询时将检索到的文档纳入LLM的上下文,进一步增强了ICL。然而,传统检索方法侧重于语义相关性,将检索视为一个搜索问题。在本文中,我们提出将ICL中的检索重新定义为推荐问题,旨在选择能够最大化ICL任务效用的文档。我们引入了上下文学习嵌入与重排序器基准(ICLERB),这是一个新颖的评估框架,它根据检索器在ICL场景中提升LLM准确性的能力来比较不同的检索器。此外,我们提出了一种新颖的基于AI反馈的强化学习排序(RLRAIF)算法,该算法旨在利用来自LLM的少量反馈对检索模型进行微调。我们的实验结果揭示了ICLERB与现有基准之间的显著差异,并证明使用我们的RLRAIF算法微调的小型模型优于大型最先进的检索模型。这些发现突显出现有评估方法的局限性,以及需要专门针对ICL定制的基准和训练策略。