Differential Privacy of Cross-Attention with Provable Guarantee

Cross-attention has become a fundamental module nowadays in many important artificial intelligence applications, e.g., retrieval-augmented generation (RAG), system prompt, guided stable diffusion, and many so on. Ensuring cross-attention privacy is crucial and urgently needed because its key and value matrices may contain sensitive information about companies and their users, many of which profit solely from their system prompts or RAG data. In this work, we design a novel differential privacy (DP) data structure to address the privacy security of cross-attention with a theoretical guarantee. In detail, let $n$ be the input token length of system prompt/RAG data, $d$ be the feature dimension, $0 < \alpha \le 1$ be the relative error parameter, $R$ be the maximum value of the query and key matrices, $R_w$ be the maximum value of the value matrix, and $r,s,\epsilon_s$ be parameters of polynomial kernel methods. Then, our data structure requires $\widetilde{O}(ndr^2)$ memory consumption with $\widetilde{O}(nr^2)$ initialization time complexity and $\widetilde{O}(\alpha^{-1} r^2)$ query time complexity for a single token query. In addition, our data structure can guarantee that the user query is $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer. Furthermore, our result is robust to adaptive queries in which users can intentionally attack the cross-attention system. To our knowledge, this is the first work to provide DP for cross-attention. We believe it can inspire more privacy algorithm design in large generative models (LGMs).

翻译：交叉注意力已成为当今许多重要人工智能应用的基础模块，例如检索增强生成（RAG）、系统提示、引导式稳定扩散等。确保交叉注意力的隐私至关重要且迫在眉睫，因为其键矩阵和值矩阵可能包含关于公司及其用户的敏感信息，其中许多公司仅依靠其系统提示或RAG数据盈利。在本工作中，我们设计了一种新颖的差分隐私（DP）数据结构，以解决交叉注意力的隐私安全问题，并提供理论保证。具体而言，令$n$为系统提示/RAG数据的输入令牌长度，$d$为特征维度，$0 < \alpha \le 1$为相对误差参数，$R$为查询矩阵和键矩阵的最大值，$R_w$为值矩阵的最大值，$r,s,\epsilon_s$为多项式核方法的参数。那么，我们的数据结构需要$\widetilde{O}(ndr^2)$的内存消耗，具有$\widetilde{O}(nr^2)$的初始化时间复杂度和$\widetilde{O}(\alpha^{-1} r^2)$的单令牌查询时间复杂度。此外，我们的数据结构能够保证用户查询是$(\epsilon, \delta)$-DP的，其输出与真实答案之间的加性误差为$\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$，相对误差为$n^{-1} (\alpha + \epsilon_s)$。进一步地，我们的结果对于用户可能故意攻击交叉注意力系统的自适应查询具有鲁棒性。据我们所知，这是首个为交叉注意力提供差分隐私的工作。我们相信它能激发大型生成模型（LGMs）中更多隐私算法的设计。