Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often produces complex and intractable sampling distributions. In this paper, we propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests, which builds on the work of Xie and Wang (2022). We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.
翻译:隐私保护方法(如差分隐私机制)会向统计量中引入噪声,从而产生复杂且难以处理的抽样分布。本文提出了一种基于模拟的"再抽样"方法,用于构建具有统计有效性的置信区间和假设检验,该方法建立在Xie与Wang(2022)的研究基础之上。我们发现该方法可广泛应用于各类私有推断问题,能恰当处理隐私机制(如裁剪操作)引入的偏差,并在覆盖率和第一类错误控制方面优于参数自助法等现有前沿推断方法。我们还针对一般模型(不限于隐私相关场景)对再抽样方法进行了重要改进与扩展,具体包括:1)通过调整流程确保覆盖率和第一类错误能得到保证(即使考虑蒙特卡洛误差);2)提出高效数值算法以实现置信区间与$p$值的计算。