Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
翻译:差分隐私保证允许在涉及敏感数据的统计分析中发布结果,同时不损害任何参与个体的隐私。实现此类保证通常需要注入噪声,无论是直接注入参数估计还是注入估计过程。研究表明,从贝叶斯后验分布采样并非人为引入扰动,而是指数机制的一个特例,能在不改变数据生成过程的情况下产生一致且高效的私有估计。然而,当前方法的应用因其强有界假设而受到限制,这些假设对简单线性回归器等基本模型并不成立。为改善这一问题,我们提出βD-Bayes,这是一种从广义后验分布采样的方案,旨在最小化模型与数据生成过程之间的β散度。该方法提供了普遍适用的私有估计,无需更改底层模型,并能一致地学习数据生成参数。我们证明,在相同的隐私保证下,βD-Bayes能产生更精确的推断估计,并首次通过后验采样为复杂分类器和连续回归模型(如神经网络)实现了差分隐私估计。