In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/\delta)}}{\epsilon n}\right)$ under $(\epsilon, \delta)$-DP, and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch. However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires $\Omega(n)$ batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using $\sqrt{n}$ batch gradient steps with batch size $\sqrt{n}$, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to $n^{1/4}$ batch gradient steps with batch size $n^{3/4}$. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.
翻译:本文重新审视差分隐私随机凸优化问题。对于凸光滑损失函数,已知标准的差分隐私随机梯度下降在$(\epsilon, \delta)$-差分隐私下能达到$O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/\delta)}}{\epsilon n}\right)$的最优速率,且已知差分隐私随机梯度下降的变体可在单轮次内达到最优速率。然而,在联邦学习等应用中至关重要的批次梯度复杂度(即自适应优化步骤数)尚未得到充分理解。具体而言,所有先前关于差分隐私随机凸优化的研究都需要$\Omega(n)$次批次梯度步骤、多轮次或依赖凸性来保证隐私。我们提出一种算法——加速差分隐私随机递归梯度下降,它突破了以往工作的限制:该算法在单轮次内使用$\sqrt{n}$次批次梯度步骤(批次大小为$\sqrt{n}$)即可达到差分隐私随机凸优化的最优速率(至多相差多对数因子),并且通过梯度裁剪可适用于任意(非凸)损失函数的隐私保护。若全局最小化点位于约束集内,我们可进一步将批次梯度步骤减少至$n^{1/4}$次(批次大小为$n^{3/4}$)。为实现这一目标,我们的算法融合了三个关键要素:随机递归梯度的变体、加速梯度下降以及差分隐私持续计数中的相关噪声生成机制。