We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss. Under modest assumptions on the input, we characterize the distribution of the iterate at each time step. Our analysis leads to new results on the algorithm's accuracy: for a proper fixed choice of hyperparameters, the sample complexity depends only linearly on the dimension of the data. This matches the dimension-dependence of the (non-private) ordinary least squares estimator as well as that of recent private algorithms that rely on sophisticated adaptive gradient-clipping schemes (Varshney et al., 2022; Liu et al., 2023). Our analysis of the iterates' distribution also allows us to construct confidence intervals for the empirical optimizer which adapt automatically to the variance of the algorithm on a particular data set. We validate our theorems through experiments on synthetic data.
翻译:我们在平方误差损失下,对标准差分隐私梯度下降在线性回归中的应用进行了改进分析。在输入数据满足温和假设的条件下,我们刻画了每次迭代时迭代量的分布。这一分析为该算法的精度带来了新结论:当超参数固定恰当时,样本复杂度仅与数据维度呈线性关系。这一维度依赖性既与(非私有)普通最小二乘估计量相匹配,也与近期采用复杂自适应梯度裁剪方案的私有算法(Varshney等人,2022;Liu等人,2023)相当。通过对迭代量分布的分析,我们还能为经验优化器构建置信区间,该区间可自动适应算法在特定数据集上的方差。我们通过合成数据实验验证了定理的正确性。