Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often results in complex and intractable sampling distributions. In this paper, we propose to use the simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests based on privatized statistics. We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.
翻译:隐私保护方法(如差分隐私机制)会向统计结果中引入噪声,导致抽样分布复杂且难以处理。本文提出采用基于仿真的"重抽样"方法,基于隐私化统计量构建具有统计有效性的置信区间和假设检验。我们证明该方法适用于各类隐私推断问题,能够恰当处理隐私机制(如截断操作)带来的偏差,并在覆盖率和第一类错误方面优于参数自助法等现有先进推断方法。针对一般模型(不限于隐私相关),我们进一步改进了重抽样方法的重要性能与适用性,包括:1)修改流程以保证在考虑蒙特卡洛误差时仍能达到理论覆盖率和第一类错误控制;2)提出高效数值算法以实现置信区间与$p$值的计算。