Differentially private (DP) mechanisms protect individual-level information by introducing randomness into the statistical analysis procedure. Despite the availability of numerous DP tools, there remains a lack of general techniques for conducting statistical inference under DP. We examine a DP bootstrap procedure that releases multiple private bootstrap estimates to infer the sampling distribution and construct confidence intervals (CIs). Our privacy analysis presents new results on the privacy cost of a single DP bootstrap estimate, applicable to any DP mechanisms, and identifies some misapplications of the bootstrap in the existing literature. Using the Gaussian-DP (GDP) framework (Dong et al.,2022), we show that the release of $B$ DP bootstrap estimates from mechanisms satisfying $(\mu/\sqrt{(2-2/\mathrm{e})B})$-GDP asymptotically satisfies $\mu$-GDP as $B$ goes to infinity. Moreover, we use deconvolution with the DP bootstrap estimates to accurately infer the sampling distribution, which is novel in DP. We derive CIs from our density estimate for tasks such as population mean estimation, logistic regression, and quantile regression, and we compare them to existing methods using simulations and real-world experiments on 2016 Canada Census data. Our private CIs achieve the nominal coverage level and offer the first approach to private inference for quantile regression.
翻译:差分隐私机制通过在统计分析过程中引入随机性来保护个体层面的信息。尽管已有大量差分隐私工具可用,但在差分隐私下进行统计推断仍需通用的方法。我们研究了一种差分隐私自助法流程,该方法发布多个私有自助法估计值,用于推断抽样分布并构建置信区间。我们的隐私分析针对任意差分隐私机制,提出了单个差分隐私自助法估计值隐私成本的新结果,并指出现有文献中对自助法的某些误用。基于高斯差分隐私框架,我们证明:从满足$(\mu/\sqrt{(2-2/\mathrm{e})B})$-高斯差分隐私的机制中释放$B$个差分隐私自助法估计值,当$B$趋于无穷时,渐进满足$\mu$-高斯差分隐私。此外,我们创新性地利用反卷积技术与差分隐私自助法估计值精确推断抽样分布,这在差分隐私领域尚属首次。我们基于密度估计构建了面向总体均值估计、逻辑回归及分位数回归任务的置信区间,并通过2016年加拿大人口普查数据的模拟实验和真实世界实验,将所提方法与传统方法进行比较。我们的私有置信区间能达到名义覆盖水平,并首次提出了针对分位数回归的私有推断方法。