While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge, providing access to synthetic data instead. We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs), which enforce privacy using differentially private stochastic gradient descent (DP-SGD). We investigate the DM parameterization and the sampling algorithm, which turn out to be crucial ingredients in DPDMs, and propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments. Moreover, on standard benchmarks, classifiers trained on DPDM-generated synthetic data perform on par with task-specific DP-SGD-trained classifiers, which has not been demonstrated before for DP generative models. Project page and code: https://nv-tlabs.github.io/DPDM.
翻译:尽管现代机器学习模型依赖于日益庞大的训练数据集,但在隐私敏感领域,数据往往受到限制。在敏感数据上使用差分隐私(DP)训练的生成模型可以规避这一挑战,转而提供对合成数据的访问。我们基于扩散模型(DMs)近期取得的成功,引入了差分隐私扩散模型(DPDMs),该模型使用差分隐私随机梯度下降(DP-SGD)来保障隐私。我们研究了DM的参数化方法和采样算法,这两者被证明是DPDMs中的关键要素,并提出了噪声倍增技术——一种针对DM训练量身定制的DP-SGD改进方案。我们在图像生成基准上验证了新型DPDMs,并在所有实验中达到了最先进的性能。此外,在标准基准测试中,基于DPDM生成合成数据训练的分类器,其表现与使用特定任务DP-SGD训练的分类器相当,这在差分隐私生成模型中此前尚未得到证实。项目页面及代码:https://nv-tlabs.github.io/DPDM。