We develop a univariate, differentially private mean estimator, called the private modified winsorized mean designed to be used as the aggregator in subsample-and-aggregate. We demonstrate, via real data analysis, that common differentially private multivariate mean estimators may not perform well as the aggregator, even with a dataset with 8000 observations, motivating our developments. We show that the modified winsorized mean is minimax optimal for several, large classes of distributions, even under adversarial contamination. We also demonstrate that, empirically, the modified winsorized mean performs well compared to other private mean estimates. We consider the modified winsorized mean as the aggregator in subsample-and-aggregate, deriving a finite sample deviations bound for a subsample-and-aggregate estimate generated with the new aggregator. This result yields two important insights: (i) the optimal choice of subsamples depends on the bias of the estimator computed on the subsamples, and (ii) the rate of convergence of the subsample-and-aggregate estimator depends on the robustness of the estimator computed on the subsamples.
翻译:我们提出了一种称为私有修正缩尾均值的单变量差分隐私均值估计器,专为子样本聚合中的聚合器设计。通过实际数据分析,我们发现常见的差分隐私多元均值估计器即使在使用8000个观测值的数据集时,作为聚合器的表现也可能不佳,这促使我们开展此项研究。我们证明,修正缩尾均值在多个大类分布中具有极小极大最优性,即使在对抗性污染条件下也成立。实验结果表明,与其他私有均值估计方法相比,修正缩尾均值具有优越性能。我们将修正缩尾均值作为子样本聚合的聚合器,推导出使用该新型聚合器生成的子样本聚合估计量的有限样本偏差界。该结果揭示两个重要结论:(i) 最优子样本选择取决于子样本上计算的估计量的偏差;(ii) 子样本聚合估计量的收敛速率取决于子样本上计算的估计量的鲁棒性。