In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition. A recent focus in private learning research is improving the performance of DP-SGD on private data by incorporating priors that are learned on real-world public data. In this work, we explore how we can improve the privacy-utility tradeoff of DP-SGD by learning priors from images generated by random processes and transferring these priors to private data. We propose DP-RandP, a three-phase approach. We attain new state-of-the-art accuracy when training from scratch on CIFAR10, CIFAR100, and MedMNIST for a range of privacy budgets $\varepsilon \in [1, 8]$. In particular, we improve the previous best reported accuracy on CIFAR10 from $60.6 \%$ to $72.3 \%$ for $\varepsilon=1$. Our code is available at https://github.com/inspire-group/DP-RandP.
翻译:在隐私保护机器学习中,差分隐私随机梯度下降(DP-SGD)因逐样本梯度裁剪和噪声添加而性能劣于SGD。近期隐私学习研究的重点在于,通过融入从真实世界公共数据中学习到的先验知识,提升DP-SGD在私有数据上的性能表现。本研究探索如何通过从随机过程生成的图像中学习先验,并将这些先验迁移至私有数据,来改进DP-SGD的隐私-效用权衡。我们提出三阶段方法DP-RandP,在隐私预算$\varepsilon \in [1, 8]$范围内,以CIFAR10、CIFAR100和MedMNIST数据集从零训练时达到了新的最优准确率。特别地,当$\varepsilon=1$时,我们在CIFAR10上将其先前最优准确率从$60.6 \%$提升至$72.3 \%$。我们的代码已开源:https://github.com/inspire-group/DP-RandP。