Retrieval augmented generation (RAG) exhibits outstanding performance in promoting the knowledge capabilities of large language models (LLMs) with retrieved documents related to user queries. However, RAG only focuses on improving the response quality of LLMs via enhancing queries indiscriminately with retrieved information, paying little attention to what type of knowledge LLMs really need to answer original queries more accurately. In this paper, we suggest that long-tail knowledge is crucial for RAG as LLMs have already remembered common world knowledge during large-scale pre-training. Based on our observation, we propose a simple but effective long-tail knowledge detection method for LLMs. Specifically, the novel Generative Expected Calibration Error (GECE) metric is derived to measure the ``long-tailness'' of knowledge based on both statistics and semantics. Hence, we retrieve relevant documents and infuse them into the model for patching knowledge loopholes only when the input query relates to long-tail knowledge. Experiments show that, compared to existing RAG pipelines, our method achieves over 4x speedup in average inference time and consistent performance improvement in downstream tasks.
翻译:检索增强生成(RAG)通过检索与用户查询相关的文档,在提升大语言模型(LLMs)的知识能力方面表现出色。然而,RAG仅侧重于通过不加区分地利用检索信息增强查询来提高LLMs的响应质量,却很少关注LLMs究竟需要何种类型的知识才能更准确地回答原始查询。本文认为,由于LLMs已在大规模预训练中记住了常见的世界知识,长尾知识对于RAG至关重要。基于此观察,我们提出了一种简单而有效的LLMs长尾知识检测方法。具体而言,我们推导出一种新颖的生成式期望校准误差(GECE)度量,该度量基于统计和语义来衡量知识的“长尾性”。因此,仅当输入查询涉及长尾知识时,我们才检索相关文档并将其注入模型以填补知识漏洞。实验表明,与现有的RAG流程相比,我们的方法在平均推理时间上实现了超过4倍的加速,并在下游任务中取得了持续的性能提升。