Retrieval-augmented generation (RAG) improves the service quality of large language models by retrieving relevant documents from credible literature and integrating them into the context of the user query. Recently, the rise of the cloud RAG service has made it possible for users to query relevant documents conveniently. However, directly sending queries to the cloud brings potential privacy leakage. In this paper, we are the first to formally define the privacy-preserving cloud RAG service to protect the user query and propose RemoteRAG as a solution regarding privacy, efficiency, and accuracy. For privacy, we introduce $(n,\epsilon)$-DistanceDP to characterize privacy leakage of the user query and the leakage inferred from relevant documents. For efficiency, we limit the search range from the total documents to a small number of selected documents related to a perturbed embedding generated from $(n,\epsilon)$-DistanceDP, so that computation and communication costs required for privacy protection significantly decrease. For accuracy, we ensure that the small range includes target documents related to the user query with detailed theoretical analysis. Experimental results also demonstrate that RemoteRAG can resist existing embedding inversion attack methods while achieving no loss in retrieval under various settings. Moreover, RemoteRAG is efficient, incurring only $0.67$ seconds and $46.66$KB of data transmission ($2.72$ hours and $1.43$ GB with the non-optimized privacy-preserving scheme) when retrieving from a total of $10^6$ documents.
翻译:检索增强生成(RAG)通过从可信文献中检索相关文档并将其整合到用户查询的上下文中,从而提升大型语言模型的服务质量。近年来,云端RAG服务的兴起使得用户能够便捷地查询相关文档。然而,直接将查询发送至云端会带来潜在的隐私泄露风险。本文首次正式定义了保护用户查询的隐私保护型云端RAG服务,并提出了在隐私性、效率与准确性方面均具优势的RemoteRAG解决方案。在隐私性方面,我们引入$(n,\epsilon)$-DistanceDP来刻画用户查询的隐私泄露以及从相关文档推断出的信息泄露。在效率方面,我们将搜索范围从全部文档限制到与通过$(n,\epsilon)$-DistanceDP生成的扰动嵌入向量相关的少量选定文档,从而显著降低了隐私保护所需的计算与通信开销。在准确性方面,我们通过详细的理论分析确保该小范围包含与用户查询相关的目标文档。实验结果也表明,RemoteRAG能够抵御现有的嵌入反转攻击方法,并在多种设置下实现检索无损失。此外,RemoteRAG具有高效性,在从总计$10^6$篇文档中检索时,仅产生$0.67$秒的时间开销和$46.66$KB的数据传输量(若采用未优化的隐私保护方案,则需$2.72$小时和$1.43$GB)。