Due to the extraordinarily large number of parameters, fine-tuning Large Language Models (LLMs) to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs' preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3\% on TriviaQA, 19.9\% on NQ and 27.5\% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.
翻译:由于参数量极为庞大,微调大型语言模型以更新长尾或过时知识在众多应用中并不现实。为避免微调,可将大语言模型视为黑盒(即冻结其参数),并通过检索增强生成系统进行增强,即黑盒RAG。近年来,黑盒RAG在知识密集型任务中取得显著成功并受到广泛关注。现有黑盒RAG方法通常通过微调检索器以适配大语言模型的偏好,并将所有检索文档拼接作为输入,然而存在两大问题:(1)忽略事实信息:大语言模型偏好的文档可能不包含针对给定问题的事实信息,这会误导检索器并损害黑盒RAG的有效性;(2)令牌浪费:简单拼接所有检索文档会引入大量不必要的令牌,降低黑盒RAG的效率。为解决上述问题,本文提出一种新型黑盒RAG框架,该框架在检索中利用事实信息并减少增强所需的令牌数量,称为FIT-RAG。FIT-RAG通过构建双标签文档评分器来利用事实信息,同时引入自知识识别器与子文档级令牌精简器来减少令牌。在TriviaQA、NQ和PopQA三个开放域问答数据集上的大量实验表明,FIT-RAG在有效性和效率上均表现优越:相较于基准模型,FIT-RAG可使Llama2-13B-Chat的答案准确率在TriviaQA上提升14.3%,在NQ上提升19.9%,在PopQA上提升27.5%,同时在三个数据集上平均节省约一半的令牌。