While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge, providing access to synthetic data instead. We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs), which enforce privacy using differentially private stochastic gradient descent (DP-SGD). We investigate the DM parameterization and the sampling algorithm, which turn out to be crucial ingredients in DPDMs, and propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments. Moreover, on standard benchmarks, classifiers trained on DPDM-generated synthetic data perform on par with task-specific DP-SGD-trained classifiers, which has not been demonstrated before for DP generative models. Project page and code: https://nv-tlabs.github.io/DPDM.
翻译:尽管现代机器学习模型依赖日益庞大的训练数据集,但在隐私敏感领域,数据往往受限。基于差分隐私(DP)在敏感数据上训练的生成模型可以规避这一挑战,提供对合成数据的访问。我们基于扩散模型(DMs)近期的成功,提出了差分隐私扩散模型(DPDMs),其通过差分隐私随机梯度下降(DP-SGD)来强制执行隐私保护。我们研究了DM的参数化和采样算法(这被证明是DPDMs的关键组成部分),并提出了噪声多样性——一种针对DM训练定制的DP-SGD强力改进方法。我们在图像生成基准上验证了新型DPDMs,并在所有实验中实现了最先进的性能。此外,在标准基准测试中,基于DPDM生成合成数据训练的分类器表现与任务专用DP-SGD训练的分类器相当,这是此前DP生成模型从未实现过的成果。项目页面与代码:https://nv-tlabs.github.io/DPDM。