Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

Retrieval-augmented generation (RAG) techniques leverage the in-context learning capabilities of large language models (LLMs) to produce more accurate and relevant responses. Originating from the simple 'retrieve-then-read' approach, the RAG framework has evolved into a highly flexible and modular paradigm. A critical component, the Query Rewriter module, enhances knowledge retrieval by generating a search-friendly query. This method aligns input questions more closely with the knowledge base. Our research identifies opportunities to enhance the Query Rewriter module to Query Rewriter+ by generating multiple queries to overcome the Information Plateaus associated with a single query and by rewriting questions to eliminate Ambiguity, thereby clarifying the underlying intent. We also find that current RAG systems exhibit issues with Irrelevant Knowledge; to overcome this, we propose the Knowledge Filter. These two modules are both based on the instruction-tuned Gemma-2B model, which together enhance response quality. The final identified issue is Redundant Retrieval; we introduce the Memory Knowledge Reservoir and the Retriever Trigger to solve this. The former supports the dynamic expansion of the RAG system's knowledge base in a parameter-free manner, while the latter optimizes the cost for accessing external knowledge, thereby improving resource utilization and response efficiency. These four RAG modules synergistically improve the response quality and efficiency of the RAG system. The effectiveness of these modules has been validated through experiments and ablation studies across six common QA datasets. The source code can be accessed at https://github.com/Ancientshi/ERM4.

翻译：检索增强生成（RAG）技术利用大语言模型（LLM）的上下文学习能力，以生成更准确、更相关的响应。从简单的"检索-阅读"方法发展而来，RAG框架已演变为高度灵活和模块化的范式。其中，查询重写器模块作为关键组件，通过生成适于检索的查询来增强知识获取能力，使输入问题与知识库更紧密地匹配。本研究提出将查询重写器模块升级为查询重写器+，通过生成多重查询以克服单一查询导致的信息瓶颈，并通过问题重写消除歧义，从而明确潜在意图。同时，我们发现现有RAG系统存在无关知识问题；为此，我们提出知识过滤器模块。这两个模块均基于指令微调的Gemma-2B模型构建，共同提升响应质量。最后，针对冗余检索问题，我们引入记忆知识库与检索触发器：前者以无参数方式支持RAG系统知识库的动态扩展，后者优化了访问外部知识的成本，从而提升资源利用率和响应效率。这四个RAG模块协同作用，显著提升了RAG系统的响应质量与效率。通过在六个常见问答数据集上的实验与消融研究，这些模块的有效性得到了验证。源代码可通过https://github.com/Ancientshi/ERM4获取。