FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

Due to the extraordinarily large number of parameters, fine-tuning Large Language Models (LLMs) to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs' preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3\% on TriviaQA, 19.9\% on NQ and 27.5\% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.

翻译：由于参数量极为庞大，微调大型语言模型以更新长尾或过时知识在众多应用中并不现实。为避免微调，可将大语言模型视为黑盒（即冻结其参数），并通过检索增强生成系统进行增强，即黑盒RAG。近年来，黑盒RAG在知识密集型任务中取得显著成功并受到广泛关注。现有黑盒RAG方法通常通过微调检索器以适配大语言模型的偏好，并将所有检索文档拼接作为输入，然而存在两大问题：（1）忽略事实信息：大语言模型偏好的文档可能不包含针对给定问题的事实信息，这会误导检索器并损害黑盒RAG的有效性；（2）令牌浪费：简单拼接所有检索文档会引入大量不必要的令牌，降低黑盒RAG的效率。为解决上述问题，本文提出一种新型黑盒RAG框架，该框架在检索中利用事实信息并减少增强所需的令牌数量，称为FIT-RAG。FIT-RAG通过构建双标签文档评分器来利用事实信息，同时引入自知识识别器与子文档级令牌精简器来减少令牌。在TriviaQA、NQ和PopQA三个开放域问答数据集上的大量实验表明，FIT-RAG在有效性和效率上均表现优越：相较于基准模型，FIT-RAG可使Llama2-13B-Chat的答案准确率在TriviaQA上提升14.3%，在NQ上提升19.9%，在PopQA上提升27.5%，同时在三个数据集上平均节省约一半的令牌。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日