In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition. A recent focus in private learning research is improving the performance of DP-SGD on private data by incorporating priors that are learned on real-world public data. In this work, we explore how we can improve the privacy-utility tradeoff of DP-SGD by learning priors from images generated by random processes and transferring these priors to private data. We propose DP-RandP, a three-phase approach. We attain new state-of-the-art accuracy when training from scratch on CIFAR10, CIFAR100, MedMNIST and ImageNet for a range of privacy budgets $\varepsilon \in [1, 8]$. In particular, we improve the previous best reported accuracy on CIFAR10 from $60.6 \%$ to $72.3 \%$ for $\varepsilon=1$.
翻译:在隐私保护机器学习中,差分隐私随机梯度下降(DP-SGD)因逐样本梯度裁剪和噪声添加而性能逊于SGD。近期隐私学习研究的一个焦点是通过融入从真实世界公共数据中学习的先验知识,来提升DP-SGD在私有数据上的表现。本文探索如何通过从随机过程生成的图像中学习先验,并将这些先验迁移至私有数据,从而改善DP-SGD的隐私-效用权衡。我们提出了一种三阶段方法DP-RandP。在隐私预算$\varepsilon \in [1, 8]$范围内,针对CIFAR10、CIFAR100、MedMNIST和ImageNet数据集从头训练时,我们取得了新的最先进准确率。特别是,在$\varepsilon=1$条件下,我们将CIFAR10上先前报告的最佳准确率从60.6%提升至72.3%。