Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
翻译:差分隐私保证允许涉及敏感数据的统计分析结果在未损害任何参与者隐私的情况下发布。实现此类保证通常需要在参数估计值或估计过程中注入噪声。与人工引入扰动不同,从贝叶斯后验分布中采样已被证明是指数机制的特殊情况,能在不改变数据生成过程的情况下产生一致且高效的隐私估计。然而,当前方法的应用因其严格的界限假设而受限,这些假设对于简单线性回归等基本模型并不成立。为改善这一状况,我们提出βD-Bayes方法,这是一种从广义后验分布中采样的方案,旨在最小化模型与数据生成过程之间的β散度。该方法提供普遍适用的隐私估计,无需修改底层模型,并能一致地学习数据生成参数。我们证明,在相同隐私保证下,βD-Bayes能产生更精确的推断估计,并首次通过后验采样为复杂分类器和连续回归模型(如神经网络)实现差分隐私估计。