Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often results in complex and intractable sampling distributions. In this paper, we propose to use the simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests based on privatized statistics. We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.
翻译:隐私保护方法(如差分隐私机制)会向统计量中引入噪声,导致其抽样分布通常复杂且难以处理。本文提出利用基于模拟的"再抽样"方法,基于私有化统计量构建具有统计有效性的置信区间与假设检验。研究表明,该方法适用于多种私有推断问题,能够恰当处理隐私机制(如截断操作)带来的偏差,并在覆盖率和第一类错误率方面优于参数自助法等现有先进推断方法。此外,我们对一般模型(不限于隐私相关场景)的再抽样方法进行了重要改进与扩展,具体包括:1)改进流程以确保覆盖率和第一类错误率的保证性质,即使考虑蒙特卡洛误差;2)提出计算置信区间与$p$值的有效数值算法。