Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often produces complex and intractable sampling distributions. In this paper, we propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests, which builds on the work of Xie and Wang (2022). We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.
翻译:差分隐私等隐私保护方法会向统计结果中引入噪声,这通常导致采样分布变得复杂且难以处理。本文提出一种基于仿真的"再生样本"方法,用于构建统计有效的置信区间与假设检验,该方法建立在Xie与Wang(2022)的研究基础上。我们证明该框架适用于各类隐私推断问题,能恰当处理隐私机制(如截断法)引入的偏差,并在覆盖概率与第一类错误率方面优于参数自助法等现有最优隐私推断方法。我们还针对通用模型(不限于隐私场景)对再生样本方法进行了重要改进与拓展,包括:1)改进计算流程以保证覆盖概率与第一类错误率的理论保障,即使考虑蒙特卡洛误差时亦然;2)提出高效数值算法以实现置信区间与$p$值的计算。