Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical explanation for the duality in RAG. The benefit and detriment within this duality remain a black box that cannot be quantified or compared in an explainable manner. This paper takes the first step in theoretically giving the essential explanation of benefit and detriment in RAG by: (1) decoupling and formalizing them from RAG prediction, (2) approximating the gap between their values by representation similarity and (3) establishing the trade-off mechanism between them, to make them explainable, quantifiable, and comparable. We demonstrate that the distribution difference between retrieved texts and LLMs' knowledge acts as double-edged sword, bringing both benefit and detriment. We also prove that the actual effect of RAG can be predicted at token level. Based on our theory, we propose a practical novel method, X-RAG, which achieves collaborative generation between pure LLM and RAG at token level to preserve benefit and avoid detriment. Experiments in real-world tasks based on LLMs including OPT, LLaMA-2, and Mistral show the effectiveness of our method and support our theoretical results.
翻译:检索增强生成(RAG)利用检索到的文本来增强大语言模型(LLMs)。然而,研究表明,RAG并非始终有效,甚至可能因检索到噪声或错误文本而误导LLMs。这表明RAG具有包含益处与损害的双重性。尽管现有许多方法试图解决此问题,但它们缺乏对RAG双重性的理论解释。这种双重性中的益处与损害仍是一个黑箱,无法以可解释的方式进行量化或比较。本文首次从理论上对RAG中的益处与损害给出了本质性解释,具体通过:(1)将其从RAG预测中解耦并形式化,(2)通过表示相似性近似估计其价值间的差距,以及(3)建立二者间的权衡机制,从而使它们变得可解释、可量化且可比较。我们证明,检索文本与LLMs知识之间的分布差异如同一把双刃剑,同时带来益处与损害。我们还证明了RAG的实际效果可以在词元级别进行预测。基于我们的理论,我们提出了一种新颖的实用方法X-RAG,该方法在词元级别实现纯LLM与RAG之间的协同生成,以保留益处并避免损害。基于OPT、LLaMA-2和Mistral等LLMs的真实任务实验验证了我们方法的有效性,并支持我们的理论结果。