Estimating the probability density of a population while preserving the privacy of individuals in that population is an important and challenging problem that has received considerable attention in recent years. While the previous literature focused on frequentist approaches, in this paper, we propose a Bayesian nonparametric mixture model under differential privacy (DP) and present two Markov chain Monte Carlo (MCMC) algorithms for posterior inference. One is a marginal approach, resembling Neal's algorithm 5 with a pseudo-marginal Metropolis-Hastings move, and the other is a conditional approach. Although our focus is primarily on local DP, we show that our MCMC algorithms can be easily extended to deal with global differential privacy mechanisms. Moreover, for some carefully chosen mechanisms and mixture kernels, we show how auxiliary parameters can be analytically marginalized, allowing standard MCMC algorithms (i.e., non-privatized, such as Neal's Algorithm 2) to be efficiently employed. Our approach is general and applicable to any mixture model and privacy mechanism. In several simulations and a real case study, we discuss the performance of our algorithms and evaluate different privacy mechanisms proposed in the frequentist literature.
翻译:在保护个体隐私的前提下估计总体概率密度是一个重要且具有挑战性的问题,近年来受到了广泛关注。尽管已有文献主要关注频率学派方法,本文提出了一种差分隐私(DP)下的贝叶斯非参数混合模型,并提出了两种用于后验推断的马尔可夫链蒙特卡洛(MCMC)算法。一种是边际方法,类似于Neal算法5并包含伪边际Metropolis-Hastings转移;另一种是条件方法。虽然我们主要关注局部DP,但我们证明了所提出的MCMC算法可以轻松扩展至处理全局差分隐私机制。此外,对于某些精心选择的机制和混合核,我们展示了如何解析地边际化辅助参数,从而能够高效地使用标准MCMC算法(即非隐私化算法,如Neal算法2)。我们的方法具有通用性,适用于任何混合模型和隐私机制。通过多组模拟实验和一个真实案例研究,我们讨论了所提算法的性能,并评估了频率学派文献中提出的不同隐私机制。