Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. We generalize uniform Lipschitzness by assuming that the per-sample gradients have sample-dependent upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded. We provide principled guidance on choosing the clip norm in DP-SGD for convex over-parameterized settings satisfying our general version of Lipschitzness when the per-sample Lipschitz constants are bounded; specifically, we recommend tuning the clip norm only till values up to the minimum per-sample Lipschitz constant. This finds application in the private training of a softmax layer on top of a deep network pre-trained on public data. We verify the efficacy of our recommendation via experiments on 8 datasets. Furthermore, we provide new convergence results for DP-SGD on convex and nonconvex functions when the Lipschitz constants are unbounded but have bounded moments, i.e., they are heavy-tailed.
翻译:多数关于差分隐私随机梯度下降(DP-SGD)的现有成果均基于均匀Lipschitz性这一简化假设,即每个样本的梯度被均匀有界。我们通过假设每个样本梯度具有样本相关的上界(即每个样本的Lipschitz常数)来推广均匀Lipschitz性,且这些常数本身可能无界。在满足我们一般化Lipschitz条件(其中每个样本的Lipschitz常数有界)的凸过参数化场景中,我们为DP-SGD的裁剪范数选择提供了原则性指导;具体而言,我们建议将裁剪范数调整至不超过每个样本最小Lipschitz常数的值。这一方法可应用于在公共数据预训练的深度网络顶层对softmax层进行私有训练。我们通过在8个数据集上的实验验证了该建议的有效性。此外,当Lipschitz常数无界但具有有界矩(即呈重尾分布)时,我们给出了DP-SGD在凸与非凸函数上的新收敛性结果。