Retrieval-Augmented Generation (RAG) has proven to be an effective method for mitigating hallucination issues inherent in large language models (LLMs). Previous approaches typically train retrievers based on semantic similarity, lacking optimization for RAG. More recent works have proposed aligning retrievers with the preference signals of LLMs. However, these preference signals are often difficult for dense retrievers, which typically have weaker language capabilities, to understand and learn effectively. Drawing inspiration from pedagogical theories like Guided Discovery Learning, we propose a novel framework, FiGRet (Fine-grained Guidance for Retrievers), which leverages the language capabilities of LLMs to construct examples from a more granular, information-centric perspective to guide the learning of retrievers. Specifically, our method utilizes LLMs to construct easy-to-understand examples from samples where the retriever performs poorly, focusing on three learning objectives highly relevant to the RAG scenario: relevance, comprehensiveness, and purity. These examples serve as scaffolding to ultimately align the retriever with the LLM's preferences. Furthermore, we employ a dual curriculum learning strategy and leverage the reciprocal feedback between LLM and retriever to further enhance the performance of the RAG system. A series of experiments demonstrate that our proposed framework enhances the performance of RAG systems equipped with different retrievers and is applicable to various LLMs.
翻译:检索增强生成(RAG)已被证明是缓解大语言模型(LLMs)固有幻觉问题的有效方法。以往的方法通常基于语义相似性训练检索器,缺乏针对RAG的优化。近期研究提出将检索器与大语言模型的偏好信号对齐。然而,这些偏好信号对于通常语言能力较弱的稠密检索器而言,往往难以有效理解和学习。受引导式发现学习等教学理论的启发,我们提出了一种新颖的框架——FiGRet(细粒度检索器引导),该框架利用大语言模型的语言能力,从更细粒度、以信息为中心的视角构建示例,以指导检索器的学习。具体而言,我们的方法利用大语言模型,从检索器表现不佳的样本中构建易于理解的示例,重点关注与RAG场景高度相关的三个学习目标:相关性、全面性和纯净性。这些示例作为支架,最终使检索器与大语言模型的偏好对齐。此外,我们采用双重课程学习策略,并利用大语言模型与检索器之间的相互反馈,进一步提升RAG系统的性能。一系列实验表明,我们提出的框架提升了配备不同检索器的RAG系统的性能,并且适用于多种大语言模型。