ReSQueing Parallel and Private Stochastic Convex Optimization

We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. Given $n$ samples of Lipschitz loss functions, prior works [BFTT19, BFGT20, AFKT21, KLL21] established that if $n \gtrsim d \epsilon_{\text{dp}}^{-2}$, $(\epsilon_{\text{dp}}, \delta)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gtrsim d^2 \epsilon_{\text{dp}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime.

翻译：我们引入了一种用于随机凸优化的新工具：一种针对与（高斯）概率密度卷积的函数梯度的重新加权随机查询（ReSQue）估计器。将ReSQue与球预言加速的最新进展[CJJJLST20, ACJJS21]相结合，我们开发了在并行和私有设置中达到最先进复杂度的算法。对于约束在$\mathbb{R}^d$中单位球内的SCO目标，我们获得了以下结果（忽略多对数因子）。我们给出了一种并行算法，假设可以访问有界方差的随机梯度估计器，该算法以$d^{1/3}\epsilon_{\text{opt}}^{-2/3}$的梯度预言查询深度和总计$d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$的梯度查询次数获得优化误差$\epsilon_{\text{opt}}$。对于$\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$，我们的算法在保持随机梯度下降最优总工作量的同时，达到了[BJLLS19]的最先进预言深度。给定$n$个Lipschitz损失函数的样本，先前的工作[BFTT19, BFGT20, AFKT21, KLL21]已证明，如果$n \gtrsim d \epsilon_{\text{dp}}^{-2}$，则可以在不对SCO效用产生渐近成本的情况下实现$(\epsilon_{\text{dp}}, \delta)$-差分隐私。然而，这些先前的工作都需要超线性数量的梯度查询。我们通过使用ReSQUE设计一种在此情况下具有近线性梯度查询复杂度的算法，为足够大的$n \gtrsim d^2 \epsilon_{\text{dp}}^{-3}$填补了这一空白。