The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low variance, and low privacy loss for arbitrary distributions. On the positive side, we show that unbiased mean estimation is possible under approximate differential privacy if we assume that the distribution is symmetric. Furthermore, we show that, even if we assume that the data is sampled from a Gaussian, unbiased mean estimation is impossible under pure or concentrated differential privacy.
翻译:差分隐私均值估计的经典算法是先对样本进行截断至有界范围,再向经验均值添加噪声。截断控制了灵敏度,进而控制了为隐私保护而添加的噪声方差。但截断也会引入统计偏差。我们证明这种权衡具有内在性:对于任意分布,不存在能同时实现低偏差、低方差和低隐私损失的算法。在积极方面,我们表明若假设分布对称,则在近似差分隐私下可实现无偏均值估计。进一步地,我们证明即便假设数据采样自高斯分布,在纯差分隐私或集中差分隐私下仍不可能实现无偏均值估计。