Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval

The widespread adoption of Retrieval-Augmented Generation (RAG) systems in real-world applications has heightened concerns about the confidentiality and integrity of their proprietary knowledge bases. These knowledge bases, which play a critical role in enhancing the generative capabilities of Large Language Models (LLMs), are increasingly vulnerable to breaches that could compromise sensitive information. To address these challenges, this paper proposes an advanced encryption methodology designed to protect RAG systems from unauthorized access and data leakage. Our approach encrypts both textual content and its corresponding embeddings prior to storage, ensuring that all data remains securely encrypted. This mechanism restricts access to authorized entities with the appropriate decryption keys, thereby significantly reducing the risk of unintended data exposure. Furthermore, we demonstrate that our encryption strategy preserves the performance and functionality of RAG pipelines, ensuring compatibility across diverse domains and applications. To validate the robustness of our method, we provide comprehensive security proofs that highlight its resilience against potential threats and vulnerabilities. These proofs also reveal limitations in existing approaches, which often lack robustness, adaptability, or reliance on open-source models. Our findings suggest that integrating advanced encryption techniques into the design and deployment of RAG systems can effectively enhance privacy safeguards. This research contributes to the ongoing discourse on improving security measures for AI-driven services and advocates for stricter data protection standards within RAG architectures.

翻译：检索增强生成（RAG）系统在现实应用中的广泛普及，加剧了对其专有知识库机密性与完整性的担忧。这些知识库在增强大语言模型（LLM）生成能力方面起着关键作用，却日益面临可能泄露敏感信息的漏洞风险。为应对这些挑战，本文提出一种先进的加密方法，旨在保护RAG系统免受未授权访问与数据泄露。我们的方法在存储前对文本内容及其对应嵌入向量进行加密，确保所有数据均保持安全加密状态。该机制将访问权限限制于持有相应解密密钥的授权实体，从而显著降低非预期数据暴露的风险。此外，我们证明该加密策略能保持RAG流程的性能与功能，确保跨领域与跨应用的兼容性。为验证方法的鲁棒性，我们提供了完整的安全性证明，凸显其抵御潜在威胁与漏洞的能力。这些证明同时揭示了现有方法在鲁棒性、适应性或对开源模型依赖性方面的局限性。我们的研究表明，将先进加密技术整合至RAG系统的设计与部署中，能有效提升隐私保护水平。本研究为推动人工智能驱动服务的安全措施改进提供了学术参考，并倡导在RAG架构中建立更严格的数据保护标准。