We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.
翻译:我们考虑在具有高斯协变量(协方差结构未知)的普通最小二乘回归问题中,以隐私保护方式获取预测误差保证的任务。我们提出了首个在纯差分隐私和近似差分隐私条件下,针对该任务的样本最优多项式时间算法。我们证明,任何对本算法样本复杂度的改进都将违反统计查询或信息论下界。此外,我们的算法对少量任意异常值具有鲁棒性,并能根据异常值比例实现最优误差率。相比之下,所有现有高效算法要么存在样本复杂度的次优维度依赖性(随协变量条件数缩放),要么在隐私参数上获得多项式级更差的依赖性。我们的技术贡献包括两个方面:首先,我们在平方和框架内利用高斯分布的弹性保证,从而获得了具有最优鲁棒性率和样本复杂度的回归高效平方和算法。其次,我们将最近的鲁棒性到隐私性框架[HKMN23, (arXiv:2212.05015)]推广至考虑输入样本协方差所诱导的几何结构。该框架关键依赖于鲁棒估计器作为平方和算法,将这两个步骤结合即产生样本最优的私有回归算法。我们相信这些技术具有独立价值,并通过获得协方差感知均值估计的高效算法(具有隐私参数的最优依赖性)证明了这一点。