Sample-Optimal Private Regression in Polynomial Time

We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.

翻译：我们考虑在具有高斯协变量（协方差结构未知）的普通最小二乘回归问题中，以隐私保护方式获取预测误差保证的任务。我们提出了首个在纯差分隐私和近似差分隐私条件下，针对该任务的样本最优多项式时间算法。我们证明，任何对本算法样本复杂度的改进都将违反统计查询或信息论下界。此外，我们的算法对少量任意异常值具有鲁棒性，并能根据异常值比例实现最优误差率。相比之下，所有现有高效算法要么存在样本复杂度的次优维度依赖性（随协变量条件数缩放），要么在隐私参数上获得多项式级更差的依赖性。我们的技术贡献包括两个方面：首先，我们在平方和框架内利用高斯分布的弹性保证，从而获得了具有最优鲁棒性率和样本复杂度的回归高效平方和算法。其次，我们将最近的鲁棒性到隐私性框架[HKMN23, (arXiv:2212.05015)]推广至考虑输入样本协方差所诱导的几何结构。该框架关键依赖于鲁棒估计器作为平方和算法，将这两个步骤结合即产生样本最优的私有回归算法。我们相信这些技术具有独立价值，并通过获得协方差感知均值估计的高效算法（具有隐私参数的最优依赖性）证明了这一点。