Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often results in complex and intractable sampling distributions. In this paper, we propose to use the simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests based on privatized statistics. We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.
翻译:隐私保护方法(如差分隐私机制)会向统计结果中引入噪声,通常导致复杂且难以处理的采样分布。本文提出使用基于模拟的"再抽样"方法,基于私有化统计量生成具有统计有效性的置信区间和假设检验。研究表明,该方法广泛适用于各类私有推断问题,能够恰当处理隐私机制(如截断操作)引入的偏差,并在覆盖率和第一类错误率方面优于参数自助法等现有最优推断方法。我们进一步发展了针对通用模型(不限于隐私场景)的再抽样方法的重要改进与扩展,包括:1)修改流程以确保在考虑蒙特卡洛误差的情况下仍能保证覆盖率和第一类错误率控制;2)提出用于计算置信区间和$p$值的高效数值算法。