Ensuring the privacy of Large Language Models (LLMs) is becoming increasingly important. The most widely adopted technique to accomplish this is DP-SGD, which trains a model to guarantee Differential Privacy (DP). However, DP-SGD overestimates an adversary's capabilities in having white box access to the model and, as a result, causes longer training times and larger memory usage than SGD. On the other hand, commercial LLM deployments are predominantly cloud-based; hence, adversarial access to LLMs is black-box. Motivated by these observations, we present Private Mixing of Ensemble Distributions (PMixED): a private prediction protocol for next-token prediction that utilizes the inherent stochasticity of next-token sampling and a public model to achieve Differential Privacy. We formalize this by introducing RD-mollifers which project each of the model's output distribution from an ensemble of fine-tuned LLMs onto a set around a public LLM's output distribution, then average the projected distributions and sample from it. Unlike DP-SGD which needs to consider the model architecture during training, PMixED is model agnostic, which makes PMixED a very appealing solution for current deployments. Our results show that PMixED achieves a stronger privacy guarantee than sample-level privacy and outperforms DP-SGD for privacy $\epsilon = 8$ on large-scale datasets. Thus, PMixED offers a practical alternative to DP training methods for achieving strong generative utility without compromising privacy.
翻译:确保大型语言模型的隐私性正变得越来越重要。实现这一目标最广泛采用的技术是DP-SGD,它通过训练模型来保证差分隐私。然而,DP-SGD高估了对手对模型进行白盒访问的能力,因此导致比SGD更长的训练时间和更大的内存消耗。另一方面,商业的LLM部署主要以云为基础;因此,对手对LLM的访问是黑盒的。受这些观察的启发,我们提出了私有集成分布混合方法:一种用于下一个词预测的私有预测协议,它利用下一个词采样的固有随机性以及一个公共模型来实现差分隐私。我们通过引入RD扰动器来形式化这一过程,该扰动器将来自一组微调LLM的每个模型的输出分布投影到公共LLM输出分布周围的集合上,然后平均投影后的分布并进行采样。与需要在训练时考虑模型架构的DP-SGD不同,PMixED与模型无关,这使得PMixED成为当前部署中极具吸引力的解决方案。我们的结果表明,PMixED实现了比样本级隐私更强的隐私保障,并且在隐私$\epsilon = 8$的大规模数据集上优于DP-SGD。因此,PMixED为在不牺牲隐私的情况下实现强大的生成效用提供了一种实用的替代DP训练方法。