Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance

Retrieval augmented models show promise in enhancing traditional language models by improving their contextual understanding, integrating private data, and reducing hallucination. However, the processing time required for retrieval augmented large language models poses a challenge when applying them to tasks that require real-time responses, such as composition assistance. To overcome this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework that leverages a hybrid setting that combines both client and cloud models. HybridRAG incorporates retrieval-augmented memory generated asynchronously by a Large Language Model (LLM) in the cloud. By integrating this retrieval augmented memory, the client model acquires the capability to generate highly effective responses, benefiting from the LLM's capabilities. Furthermore, through asynchronous memory integration, the client model is capable of delivering real-time responses to user requests without the need to wait for memory synchronization from the cloud. Our experiments on Wikitext and Pile subsets show that HybridRAG achieves lower latency than a cloud-based retrieval-augmented LLM, while outperforming client-only models in utility.

翻译：检索增强模型通过提升上下文理解能力、整合私有数据并减少幻觉，在增强传统语言模型方面展现出良好前景。然而，检索增强型大语言模型所需的处理时间对其在需要实时响应的任务（如写作辅助）中的应用构成挑战。为克服这一局限，我们提出混合检索增强生成（HybridRAG）框架，该框架采用结合客户端模型与云端模型的混合配置。HybridRAG整合了由云端大语言模型（LLM）异步生成的检索增强记忆。通过集成这一检索增强记忆，客户端模型获得了生成高效响应的能力，从而受益于LLM的性能优势。此外，借助异步记忆整合机制，客户端模型能够在不等待云端记忆同步的情况下，对用户请求做出实时响应。我们在Wikitext和Pile子集上的实验表明，HybridRAG在实现比云端检索增强LLM更低延迟的同时，在实用性方面优于仅使用客户端的模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/