We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any $\alpha \le O(1)$, estimating the covariance of a Gaussian up to spectral error $\alpha$ requires $\tilde{\Omega}\left(\frac{d^{3/2}}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right)$ samples, which is tight up to logarithmic factors. This result improves over previous work which established this for $\alpha \le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right)$ samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.
翻译:我们针对高维近似差分隐私下的两个经典参数估计(亦称统计估计)任务提供了最优下界。首先,我们证明:对于任意 $\alpha \le O(1)$,通过谱误差 $\alpha$ 估计高斯分布的协方差矩阵至少需要 $\tilde{\Omega}\left(\frac{d^{3/2}}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right)$ 个样本,该结果在对数因子意义下紧确。此结果改进了先前仅针对 $\alpha \le O\left(\frac{1}{\sqrt{d}}\right)$ 的结论,且证明过程更为简洁。其次,我们证明:对于具有有界 $k$ 阶矩的重尾分布均值估计,至少需要 $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right)$ 个样本。先前工作仅能在纯差分隐私场景或 $k=2$ 的特殊情形下建立该下界。我们的技术遵循指纹编码方法,整体简洁明了。重尾估计的下界基于从隐私身份协方差高斯估计到该问题的黑盒归约。协方差估计的下界采用贝叶斯方法,证明在协方差矩阵服从逆威沙特先验分布时,任何隐私估计器在样本不足的情况下甚至无法在期望意义上达到准确。