We introduce new differentially private (DP) mechanisms for gradient-based machine learning (ML) with multiple passes (epochs) over a dataset, substantially improving the achievable privacy-utility-computation tradeoffs. We formalize the problem of DP mechanisms for adaptive streams with multiple participations and introduce a non-trivial extension of online matrix factorization DP mechanisms to our setting. This includes establishing the necessary theory for sensitivity calculations and efficient computation of optimal matrices. For some applications like $>\!\! 10,000$ SGD steps, applying these optimal techniques becomes computationally expensive. We thus design an efficient Fourier-transform-based mechanism with only a minor utility loss. Extensive empirical evaluation on both example-level DP for image classification and user-level DP for language modeling demonstrate substantial improvements over all previous methods, including the widely-used DP-SGD . Though our primary application is to ML, our main DP results are applicable to arbitrary linear queries and hence may have much broader applicability.
翻译:我们提出了一种新的差分隐私(DP)机制,适用于基于梯度的机器学习(ML)中多次遍历(多轮)数据的场景,显著提升了隐私-效用-计算三者之间的权衡效果。我们将自适应多轮参与数据流上的差分隐私机制问题形式化,并首次将在线矩阵分解DP机制非平凡地扩展至该设置。为此,我们建立了灵敏度计算和最优矩阵高效计算所需的理论基础。对于某些应用,如超过10,000步SGD的优化过程,应用这些最优技术会带来较高的计算成本。因此,我们设计了一种基于傅里叶变换的高效机制,仅牺牲极小的效用损失。在图像分类的样本级差分隐私和语言建模的用户级差分隐私任务上,大量实证评估表明,该方法相较于所有先前方法(包括广泛使用的DP-SGD)均有显著提升。尽管我们的主要应用场景是机器学习,但本文的主要差分隐私结果适用于任意线性查询,因此可能具有更广泛的适用性。