Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates

We study private empirical risk minimization (ERM) problem for losses satisfying the $(\gamma,\kappa)$-Kurdyka-{\L}ojasiewicz (KL) condition. The Polyak-{\L}ojasiewicz (PL) condition is a special case of this condition when $\kappa=2$. Specifically, we study this problem under the constraint of $\rho$ zero-concentrated differential privacy (zCDP). When $\kappa\in[1,2]$ and the loss function is Lipschitz and smooth over a sufficiently large region, we provide a new algorithm based on variance reduced gradient descent that achieves the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$ on the excess empirical risk, where $n$ is the dataset size and $d$ is the dimension. We further show that this rate is nearly optimal. When $\kappa \geq 2$ and the loss is instead Lipschitz and weakly convex, we show it is possible to achieve the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$ with a private implementation of the proximal point method. When the KL parameters are unknown, we provide a novel modification and analysis of the noisy gradient descent algorithm and show that this algorithm achieves a rate of $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{\frac{2\kappa}{4-\kappa}}\big)$ adaptively, which is nearly optimal when $\kappa = 2$. We further show that, without assuming the KL condition, the same gradient descent algorithm can achieve fast convergence to a stationary point when the gradient stays sufficiently large during the run of the algorithm. Specifically, we show that this algorithm can approximate stationary points of Lipschitz, smooth (and possibly nonconvex) objectives with rate as fast as $\tilde{O}\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)$ and never worse than $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{1/2}\big)$. The latter rate matches the best known rate for methods that do not rely on variance reduction.

翻译：我们研究满足$(\gamma,\kappa)$-Kurdyka-{\L}ojasiewicz (KL)条件的损失函数的私有经验风险最小化(ERM)问题。当$\kappa=2$时，Polyak-{\L}ojasiewicz (PL)条件是该条件的特例。具体而言，我们在$\rho$-零集中差分隐私(zCDP)约束下研究该问题。当$\kappa\in[1,2]$且损失函数在足够大区域内满足Lipschitz连续和平滑性时，我们提出了一种基于方差缩减梯度下降的新算法，该算法在超额经验风险上达到速率$\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$，其中$n$为数据集大小，$d$为维度。我们进一步证明该速率近乎最优。当$\kappa \geq 2$且损失函数改为满足Lipschitz连续和弱凸性时，我们证明通过近端点方法的私有实现可以达到速率$\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$。当KL参数未知时，我们对噪声梯度下降算法提出了新颖的改进与分析，并证明该算法自适应地达到速率$\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{\frac{2\kappa}{4-\kappa}}\big)$，且当$\kappa=2$时近乎最优。我们进一步证明，在不假设KL条件的情况下，当算法运行过程中梯度保持足够大时，相同的梯度下降算法可以快速收敛到驻点。具体而言，我们证明该算法可以逼近Lipschitz连续、平滑（可能非凸）目标函数的驻点，其速率最快可达$\tilde{O}\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)$，且不会劣于$\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{1/2}\big)$。后一速率与不依赖方差缩减方法的最优已知速率相匹配。