Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data. In the last decade, the problem of understanding its performance cost with respect to standard GD has received remarkable attention from the research community, which formally derived upper bounds on the excess population risk $R_{P}$ in different learning settings. However, existing bounds typically degrade with over-parameterization, i.e., as the number of parameters $p$ gets larger than the number of training samples $n$ -- a regime which is ubiquitous in current deep-learning practice. As a result, the lack of theoretical insights leaves practitioners without clear guidance, leading some to reduce the effective number of trainable parameters to improve performance, while others use larger models to achieve better results through scale. In this work, we show that in the popular random features model with quadratic loss, for any sufficiently large $p$, privacy can be obtained for free, i.e., $\left|R_{P} \right| = o(1)$, not only when the privacy parameter $\varepsilon$ has constant order, but also in the strongly private setting $\varepsilon = o(1)$. This challenges the common wisdom that over-parameterization inherently hinders performance in private learning.
翻译:差分隐私梯度下降(DP-GD)是一种广泛使用的算法,用于训练具有可证明训练数据隐私保障的深度学习模型。在过去十年中,理解其相对于标准GD的性能代价问题受到了研究界的显著关注,学界在不同学习场景中正式推导出了超额总体风险$R_{P}$的上界。然而,现有界通常随着过参数化而恶化,即当参数数量$p$超过训练样本数$n$时——这一机制在当前深度学习实践中普遍存在。因此,理论洞察的缺乏使实践者缺乏明确指导,导致部分研究者通过减少有效可训练参数数量来提升性能,而另一些研究者则使用更大规模的模型通过扩展规模来获得更好结果。本研究表明,在采用二次损失的流行随机特征模型中,对于任意足够大的$p$,隐私可以免费获得,即$\left|R_{P} \right| = o(1)$,这不仅适用于隐私参数$\varepsilon$为常数量级的情况,在强隐私设置$\varepsilon = o(1)$下同样成立。这一发现挑战了过参数化必然阻碍隐私学习性能的普遍认知。