Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privacy concerns for both data owners and users. Privacy-preserving RAG systems address these concerns by performing secure top-$k$ retrieval, which typically is secure sorting to identify relevant documents. However, existing systems face challenges supporting arbitrary $k$ due to their inability to change $k$, new security issues, or efficiency degradation with large $k$. This is a significant limitation because modern long-context models generally achieve higher accuracy with larger retrieval sets. We propose $p^2$RAG, a privacy-preserving RAG service that supports arbitrary top-$k$ retrieval. Unlike existing systems, $p^2$RAG avoids sorting candidate documents. Instead, it uses an interactive bisection method to determine the set of top-$k$ documents. For security, $p^2$RAG uses secret sharing on two semi-honest non-colluding servers to protect the data owner's database and the user's prompt. It enforces restrictions and verification to defend against malicious users and tightly bound the information leakage of the database. The experiments show that $p^2$RAG is 3--300$\times$ faster than the state-of-the-art PRAG for $k = 16$--$1024$.
翻译:检索增强生成(RAG)使大语言模型能够利用外部知识,但将RAG服务外包给第三方会引发数据所有者和用户的隐私担忧。隐私保护RAG系统通过执行安全的Top-$k$检索来解决这些问题,这通常涉及安全排序以识别相关文档。然而,现有系统由于无法动态调整$k$值、面临新的安全问题或在$k$值较大时效率下降,难以支持任意的$k$值。这是一个重大限制,因为现代长上下文模型通常在使用更大检索集时能获得更高的准确性。我们提出了$p^2$RAG,一种支持任意Top-$k$检索的隐私保护RAG服务。与现有系统不同,$p^2$RAG避免对候选文档进行排序,而是采用一种交互式二分法来确定Top-$k$文档集合。在安全性方面,$p^2$RAG在两个半诚实且非共谋的服务器上使用秘密共享技术,以保护数据所有者的数据库和用户的提示。它通过实施限制和验证机制来防御恶意用户,并严格限制数据库的信息泄露。实验表明,在$k = 16$至$1024$的范围内,$p^2$RAG比当前最先进的PRAG快3到300倍。