In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using $\widetilde{O}(d^2 \log \kappa)$ samples while tolerating a constant fraction of adversarial outliers. Here, $\kappa$ is the condition number of the target covariance matrix. The sample bound matches best non-private estimators in the dependence on the dimension (up to a polylogarithmic factor). We prove a new lower bound on differentially private covariance estimation to show that the dependence on the condition number $\kappa$ in the above sample bound is also tight. Prior to our work, only identifiability results (yielding inefficient super-polynomial time algorithms) were known for the problem. In the approximate DP setting, we give an efficient algorithm to estimate an unknown Gaussian distribution up to an arbitrarily tiny total variation error using $\widetilde{O}(d^2)$ samples while tolerating a constant fraction of adversarial outliers. Prior to our work, all efficient approximate DP algorithms incurred a super-quadratic sample cost or were not outlier-robust. For the special case of mean estimation, our algorithm achieves the optimal sample complexity of $\widetilde O(d)$, improving on a $\widetilde O(d^{1.5})$ bound from prior work. Our pure DP algorithm relies on a recursive private preconditioning subroutine that utilizes the recent work on private mean estimation [Hopkins et al., 2022]. Our approximate DP algorithms are based on a substantial upgrade of the method of stabilizing convex relaxations introduced in [Kothari et al., 2022].
翻译:本文提出了高效算法,用于在纯差分隐私和近似差分隐私模型下估计高斯分布,其样本复杂度在维度上达到最优。在纯差分隐私设置中,我们给出一种高效算法,能够利用$\widetilde{O}(d^2 \log \kappa)$个样本,在容忍恒定比例对抗性离群值的情况下,将未知$d$维高斯分布的总变分误差控制到任意微小值。其中,$\kappa$为目标协方差矩阵的条件数。该样本界在维度依赖性上(至多相差一个多对数因子)与最优非私有估计器一致。我们证明了差分隐私协方差估计的一个新下界,以表明上述样本界对条件数$\kappa$的依赖也是紧的。在此之前,该问题仅存在可识别性结果(导致低效的超多项式时间算法)。在近似差分隐私设置中,我们提出一种高效算法,利用$\widetilde{O}(d^2)$个样本,在容忍恒定比例对抗性离群值的情况下,将未知高斯分布的总变分误差控制到任意微小值。在此之前,所有高效的近似差分隐私算法要么需要超二次的样本成本,要么不具备离群鲁棒性。对于均值估计这一特例,我们的算法实现了$\widetilde O(d)$的最优样本复杂度,改进了先前工作中$\widetilde O(d^{1.5})$的界。我们的纯差分隐私算法依赖于递归私有预处理子程序,该子程序利用了近期关于私有均值估计的工作[Hopkins et al., 2022]。我们的近似差分隐私算法基于对[Kothari et al., 2022]中引入的稳定凸松弛方法的大幅升级。