Retrieval augmented generation has emerged as an effective method to enhance large language model performance. This approach typically relies on an internal retrieval module that uses various indexing mechanisms to manage a static pre-processed corpus. However, such a paradigm often falls short when it is necessary to integrate the most up-to-date information that has not been updated into the corpus during generative inference time. In this paper, we explore an alternative approach that leverages standard search engine APIs to dynamically integrate the latest online information (without maintaining any index for any fixed corpus), thereby improving the quality of generated content. We design a collaborative LLM-based paradigm, where we include: (i) a parser-LLM that determines if the Internet augmented generation is demanded and extracts the search keywords if so with a single inference; (ii) a mixed ranking strategy that re-ranks the retrieved HTML files to eliminate bias introduced from the search engine API; and (iii) an extractor-LLM that can accurately and efficiently extract relevant information from the fresh content in each HTML file. We conduct extensive empirical studies to evaluate the performance of this Internet search augmented generation paradigm. The experimental results demonstrate that our method generates content with significantly improved quality. Our system has been successfully deployed in a production environment to serve 01.AI's generative inference requests.
翻译:检索增强生成已成为提升大型语言模型性能的有效方法。该方法通常依赖内部检索模块,该模块使用多种索引机制来管理静态预处理语料库。然而,当需要在生成推理时整合尚未更新至语料库的最新信息时,此类范式往往存在不足。本文探索了一种替代方案,该方案利用标准搜索引擎API动态整合最新的在线信息(无需为任何固定语料库维护索引),从而提升生成内容的质量。我们设计了一种基于LLM的协同范式,其中包含:(i)一个解析器-LLM,用于判断是否需要互联网增强生成,并在需要时通过单次推理提取搜索关键词;(ii)一种混合排序策略,对检索到的HTML文件进行重新排序,以消除搜索引擎API引入的偏差;以及(iii)一个提取器-LLM,能够从每个HTML文件的新鲜内容中准确高效地提取相关信息。我们进行了广泛的实证研究以评估该互联网搜索增强生成范式的性能。实验结果表明,我们的方法能生成质量显著提升的内容。该系统已成功部署在生产环境中,用于处理01.AI的生成推理请求。