Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) for generating more factual, accurate, and up-to-date content. Existing methods either optimize prompts to guide LLMs in leveraging retrieved information or directly fine-tune LLMs to adapt to RAG scenarios. Although fine-tuning can yield better performance, it often compromises the LLMs' general generation capabilities by modifying their parameters. This limitation poses challenges in practical applications, especially when LLMs are already deployed, as parameter adjustments may affect their original functionality. To address this, we propose a novel method that involves learning scalable and pluggable virtual tokens for RAG. By maintaining the LLMs' original parameters and fine-tuning only the embeddings of these pluggable tokens, our approach not only enhances LLMs' performance but also preserves their general generation capabilities. Furthermore, we design several training strategies to improve the scalability, flexibility, and generalizability of our method. Comprehensive experiments across 12 question-answering tasks demonstrate the superiority of our approach.
翻译:检索增强生成(RAG)是提升大语言模型(LLMs)生成内容的事实性、准确性和时效性的一种有效途径。现有方法或通过优化提示词引导LLMs利用检索信息,或直接对LLMs进行微调以适配RAG场景。尽管微调可获得更优性能,但修改模型参数常会损害LLMs的通用生成能力。这一局限在实际应用中,尤其是LLMs已部署的情况下,带来了显著挑战,因为参数调整可能影响其原有功能。为解决此问题,我们提出一种新颖方法,通过为RAG学习可扩展且可插拔的虚拟标记来实现优化。该方法保持LLMs原始参数不变,仅对这些可插拔标记的嵌入表示进行微调,从而在提升LLMs性能的同时,完整保留其通用生成能力。此外,我们设计了多种训练策略以增强方法的可扩展性、灵活性与泛化能力。在12项问答任务上的全面实验验证了本方法的优越性。