We develop both theory and algorithms to analyze privatized data in unbounded differential privacy (DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ is at an appropriate rate; we also establish that Approximate Bayesian Computation (ABC)-type posterior distributions converge under similar assumptions. We further give asymptotic results in the regime where the privacy budget for $n$ goes to infinity, establishing similarity of sampling distributions as well as showing that the MLE in the unbounded setting converges to the bounded-DP MLE. To facilitate valid, finite-sample Bayesian inference on privatized data under unbounded DP, we propose a reversible jump MCMC algorithm which extends the data augmentation MCMC of Ju et al, (2022). We also propose a Monte Carlo EM algorithm to compute the MLE from privatized data in both bounded and unbounded DP. We apply our methodology to analyze a linear regression model as well as a 2019 American Time Use Survey Microdata File which we model using a Dirichlet distribution.
翻译:针对无界差分隐私(Unbounded DP)下的隐私化数据,我们发展了理论和算法,其中样本大小本身也被视为需要隐私保护的敏感量。我们证明:在用于隐私化样本大小的噪声以适当速率添加的条件下,随着样本容量$n$趋向无穷,无界DP与有界DP下抽样分布之间的距离趋近于零;同时,在类似假设下,近似贝叶斯计算(ABC)类后验分布也收敛。我们还给出了隐私预算关于$n$趋于无穷时的渐近结果,证明了抽样分布的相似性,并表明无界设定下的MLE收敛于有界DP下的MLE。为在无界DP下对隐私化数据进行有效的有限样本贝叶斯推断,我们提出了一种可逆跳跃MCMC算法,该算法扩展了Ju等人(2022)的数据增强MCMC方法。此外,我们还提出了一种蒙特卡洛EM算法,用于从有界和无界DP下的隐私化数据中计算MLE。我们将该方法应用于线性回归模型的分析,以及一份2019年美国时间使用调查微数据文件,并使用狄利克雷分布对该数据进行建模。