Information Retrieval (IR), the process of finding information to satisfy user's information needs, plays an essential role in modern people's lives. Recently, large language models (LLMs) have demonstrated remarkable capabilities across various tasks, some of which are important for IR. Nonetheless, LLMs frequently confront the issue of generating responses that lack specificity. This has limited the overall effectiveness of LLMs for IR in many cases. To address these issues, we present an unsupervised alignment framework called Reinforcement Learning from Contrastive Feedback (RLCF), which empowers LLMs to generate both high-quality and context-specific responses that suit the needs of IR tasks. Specifically, we construct contrastive feedback by comparing each document with its similar documents, and then propose a reward function named Batched-MRR to teach LLMs to generate responses that captures the fine-grained information that distinguish documents from their similar ones. To demonstrate the effectiveness of RLCF, we conducted experiments in two typical applications of LLMs in IR, i.e., data augmentation and summarization. The experimental results show that RLCF can effectively improve the performance of LLMs in IR context.
翻译:信息检索(IR)是指为满足用户信息需求而查找信息的过程,在现代人的生活中扮演着至关重要的角色。近年来,大型语言模型(LLM)在各类任务中展现出卓越能力,其中部分能力对信息检索尤为重要。然而,LLM常面临生成缺乏特异性的响应这一问题,这在许多情况下限制了LLM在信息检索中的整体效能。为解决这些挑战,我们提出了一种名为"对比反馈强化学习"(RLCF)的无监督对齐框架,该框架能够使LLM生成既高质量又适应信息检索任务需求的上下文特异性响应。具体而言,我们通过将每篇文档与其相似文档进行对比来构建对比反馈,进而提出名为Batched-MRR的奖励函数,以引导LLM生成能够捕捉文档间区分细粒度信息(即区分文档与其相似文档的细微特征)的响应。为验证RLCF的有效性,我们在LLM应用于信息检索的两类典型场景(即数据增强与摘要生成)中开展了实验。实验结果表明,RLCF能够有效提升LLM在信息检索上下文中的表现。