We investigate unbiased high-dimensional mean estimators in differential privacy. We consider differentially private mechanisms whose expected output equals the mean of the input dataset, for every dataset drawn from a fixed bounded $d$-dimensional domain $K$. A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it. In the first part of this paper, we study the optimal error achievable by a Gaussian noise mechanism for a given domain $K$ when the error is measured in the $\ell_p$ norm for some $p \ge 2$. We give algorithms that compute the optimal covariance for the Gaussian noise for a given $K$ under suitable assumptions, and prove a number of nice geometric properties of the optimal error. These results generalize the theory of factorization mechanisms from domains $K$ that are symmetric and finite (or, equivalently, symmetric polytopes) to arbitrary bounded domains. In the second part of the paper we show that Gaussian noise mechanisms achieve nearly optimal error among all private unbiased mean estimation mechanisms in a very strong sense. In particular, for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error as the best Gaussian noise mechanism. We extend this result to local differential privacy, and to approximate differential privacy, but for the latter the error lower bound holds either for a dataset or for a neighboring dataset, and this relaxation is necessary.
翻译:本文研究差分隐私下的无偏高维均值估计。我们考虑差分隐私机制,对于从固定有界$d$维域$K$中抽取的每个数据集,其期望输出等于输入数据集的均值。私有均值估计的经典方法是计算真实均值并对其添加无偏但可能相关的高斯噪声。在本文的第一部分,我们研究当误差以$p \ge 2$的$\ell_p$范数度量时,给定域$K$下高斯噪声机制所能达到的最优误差。我们给出了在适当假设下计算高斯噪声最优协方差的算法,并证明了最优误差的若干优美几何性质。这些结果将因子化机制的理论从对称且有限(或等价地,对称多面体)的域$K$推广到任意有界域。在第二部分,我们证明在非常强的意义上,高斯噪声机制在所有私有无偏均值估计机制中能达到近乎最优的误差。特别地,对于每个输入数据集,满足集中差分隐私的无偏均值估计器引入的误差至少近似于最优高斯噪声机制引入的误差。我们将此结果推广到局部差分隐私和近似差分隐私,但对于后者,误差下界要么针对某个数据集成立,要么针对其邻接数据集成立,且这种松弛是必要的。