Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $\alpha$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the Gaussian distribution. Considering the $(\epsilon, \delta)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory reveals that under mild assumptions, such a projection step is not actually necessary. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
翻译:在随机梯度下降(SGD)的迭代过程中注入重尾噪声近年来受到越来越多关注。虽然该算法各种理论性质主要从学习理论和优化角度得到分析,但其隐私保护性质尚未被建立。为填补这一空白,我们为噪声SGD提供了差分隐私(DP)保证,其中注入噪声遵循$\alpha$-稳定分布,该分布包含一系列重尾分布(具有无限方差)以及高斯分布。在$(\epsilon, \delta)$-DP框架下,我们证明对于一大类可能非凸的损失函数,具有重尾扰动的SGD可实现$(0, \tilde{\mathcal{O}}(1/n))$-DP,其中$n$为数据点数量。一个显著的副产品是,与先前要求梯度有界敏感性或裁剪迭代的工作相反,我们的理论表明在温和假设下,这种投影步骤实际上并非必要。我们证明重尾噪声机制与高斯情形相比能达到相似的DP保证,这表明它可以作为轻尾扰动机制的可行替代方案。