The retrieval-augmented generation (RAG) enables retrieval of relevant information from an external knowledge source and allows large language models (LLMs) to answer queries over previously unseen document collections. However, it was demonstrated that traditional RAG applications perform poorly in answering multi-hop questions, which require retrieving and reasoning over multiple elements of supporting evidence. We introduce a new method called Multi-Meta-RAG, which uses database filtering with LLM-extracted metadata to improve the RAG selection of the relevant documents from various sources, relevant to the question. While database filtering is specific to a set of questions from a particular domain and format, we found out that Multi-Meta-RAG greatly improves the results on the MultiHop-RAG benchmark. The code is available at https://github.com/mxpoliakov/Multi-Meta-RAG.
翻译:检索增强生成(RAG)技术能够从外部知识源检索相关信息,使大型语言模型(LLM)能够对先前未见过的文档集合进行查询应答。然而,传统RAG应用在回答多跳问题时表现不佳,这类问题需要检索并基于多个支撑证据元素进行推理。本文提出一种名为Multi-Meta-RAG的新方法,该方法通过LLM提取的元数据实施数据库过滤,以改进RAG从多源文档中筛选与问题相关文档的能力。尽管数据库过滤技术针对特定领域和格式的问题集具有特殊性,但我们发现Multi-Meta-RAG在MultiHop-RAG基准测试中显著提升了性能。相关代码已发布于https://github.com/mxpoliakov/Multi-Meta-RAG。