Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often produces complex and intractable sampling distributions. In this paper, we propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests, which builds on the work of Xie and Wang (2022). We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.
翻译:隐私保护方法,如差分隐私机制,会在统计结果中引入噪声,这通常导致复杂且难以处理的抽样分布。本文提出一种基于仿真的“可重复样本”方法,用于构建统计有效的置信区间和假设检验,该方法建立在Xie和Wang(2022)的研究基础上。我们证明该方法适用于多种隐私推断问题,能恰当处理隐私机制(如截断法)引入的偏差,并且在私有推断的覆盖率和第一类错误方面优于参数自助法等其他先进推断方法。我们还针对一般模型(不一定与隐私相关)的可重复样本方法进行了重要改进和扩展,包括:1)改进流程以确保即使在考虑蒙特卡洛误差的情况下也能保证覆盖率和第一类错误;2)提出高效的数值算法以实现置信区间和$p$值的计算。