Organizations seeking to utilize Large Language Models (LLMs) for knowledge querying and analysis often encounter challenges in maintaining an LLM fine-tuned on targeted, up-to-date information that keeps answers relevant and grounded. Retrieval Augmented Generation (RAG) has quickly become a feasible solution for organizations looking to overcome the challenges of maintaining proprietary models and to help reduce LLM hallucinations in their query responses. However, RAG comes with its own issues regarding scaling data pipelines across tiered-access and disparate data sources. In many scenarios, it is necessary to query beyond a single data silo to provide richer and more relevant context for an LLM. Analyzing data sources within and across organizational trust boundaries is often limited by complex data-sharing policies that prohibit centralized data storage, therefore, inhibit the fast and effective setup and scaling of RAG solutions. In this paper, we introduce Confidential Computing (CC) techniques as a solution for secure Federated Retrieval Augmented Generation (FedRAG). Our proposed Confidential FedRAG system (C-FedRAG) enables secure connection and scaling of a RAG workflows across a decentralized network of data providers by ensuring context confidentiality. We also demonstrate how to implement a C-FedRAG system using the NVIDIA FLARE SDK and assess its performance using the MedRAG toolkit and MIRAGE benchmarking dataset.
翻译:寻求利用大型语言模型(LLM)进行知识查询与分析的组织,常常面临以下挑战:如何维护一个基于特定、最新信息进行微调的LLM,以保持答案的相关性与事实依据。检索增强生成(RAG)已迅速成为组织应对专有模型维护难题、并减少LLM在查询响应中产生幻觉的可行方案。然而,RAG自身也存在问题,特别是在跨层级访问和异构数据源扩展数据管道方面。在许多场景中,需要跨越单一数据孤岛进行查询,以便为LLM提供更丰富、更相关的上下文。在组织信任边界内部及跨边界分析数据源时,常受限于复杂的数据共享政策,这些政策禁止集中式数据存储,从而阻碍了RAG解决方案的快速有效部署与扩展。本文引入机密计算(CC)技术,作为实现安全联邦检索增强生成(FedRAG)的解决方案。我们提出的机密联邦检索增强生成系统(C-FedRAG)通过确保上下文机密性,能够在去中心化的数据提供者网络中安全地连接并扩展RAG工作流。我们还演示了如何使用NVIDIA FLARE SDK实现C-FedRAG系统,并利用MedRAG工具包和MIRAGE基准数据集评估其性能。