Retrieval-Augmented Generation (RAG) is essential for enhancing Large Language Models (LLMs) with external knowledge, but its reliance on cloud environments exposes sensitive data to privacy risks. Existing privacy-preserving solutions often sacrifice retrieval quality due to noise injection or only provide partial encryption. We propose PRAG, an end-to-end privacy-preserving RAG system that achieves end-to-end confidentiality for both documents and queries without sacrificing the scalability of cloud-hosted RAG. PRAG features a dual-mode architecture: a non-interactive PRAG-I utilizes homomorphic-friendly approximations for low-latency retrieval, while an interactive PRAG-II leverages client assistance to match the accuracy of non-private RAG. To ensure robust semantic ordering, we introduce Operation-Error Estimation (OEE), a mechanism that stabilizes ranking against homomorphic noise. Experiments on large-scale datasets demonstrate that PRAG achieves competitive recall (72.45%-74.45%), practical retrieval latency, and strong resilience against graph reconstruction attacks while maintaining end-to-end confidentiality. This work confirms the feasibility of secure, high-performance RAG at scale.
翻译:检索增强生成(RAG)对于利用外部知识增强大语言模型(LLM)至关重要,但其对云环境的依赖使敏感数据面临隐私风险。现有的隐私保护解决方案通常因噪声注入而牺牲检索质量,或仅提供部分加密。我们提出PRAG,一个端到端的隐私保护RAG系统,在不牺牲云托管RAG可扩展性的前提下,实现文档和查询的端到端机密性。PRAG采用双模架构:非交互式PRAG-I利用同态友好近似实现低延迟检索,而交互式PRAG-II借助客户端辅助达到与非隐私RAG相当的精度。为确保鲁棒的语义排序,我们引入操作误差估计(OEE)机制,用于稳定对抗同态噪声的排序。在大规模数据集上的实验表明,PRAG在保持端到端机密性的同时,实现了具有竞争力的召回率(72.45%-74.45%)、实用的检索延迟,以及针对图重构攻击的强韧性。本工作证实了大规模安全高性能RAG的可行性。