We study the canonical statistical estimation problem of linear regression from $n$ i.i.d.~examples under $(\varepsilon,\delta)$-differential privacy when some response variables are adversarially corrupted. We propose a variant of the popular differentially private stochastic gradient descent (DP-SGD) algorithm with two innovations: a full-batch gradient descent to improve sample complexity and a novel adaptive clipping to guarantee robustness. When there is no adversarial corruption, this algorithm improves upon the existing state-of-the-art approach and achieves a near optimal sample complexity. Under label-corruption, this is the first efficient linear regression algorithm to guarantee both $(\varepsilon,\delta)$-DP and robustness. Synthetic experiments confirm the superiority of our approach.
翻译:我们研究了在$(\varepsilon,\delta)$-差分隐私约束下,且部分响应变量遭受对抗性破坏时,基于$n$个独立同分布样本的线性回归这一经典统计估计问题。我们提出了一种流行的差分隐私随机梯度下降(DP-SGD)算法的变体,该算法包含两项创新:采用全批量梯度下降以提高样本复杂度,以及一种新颖的自适应裁剪机制以保证鲁棒性。在无对抗性破坏的情况下,该算法改进了现有最优方法,实现了近乎最优的样本复杂度。在标签破坏场景下,这是首个同时保证$(\varepsilon,\delta)$-差分隐私与鲁棒性的高效线性回归算法。合成实验验证了本方法的优越性。