This paper introduces a new method that embeds any Bayesian model used to generate synthetic data and converts it into a differentially private (DP) mechanism. We propose an alteration of the model synthesizer to utilize a censored likelihood that induces upper and lower bounds of [$\exp(-\epsilon / 2), \exp(\epsilon / 2)$], where $\epsilon$ denotes the level of the DP guarantee. This censoring mechanism equipped with an $\epsilon-$DP guarantee will induce distortion into the joint parameter posterior distribution by flattening or shifting the distribution towards a weakly informative prior. To minimize the distortion in the posterior distribution induced by likelihood censoring, we embed a vector-weighted pseudo posterior mechanism within the censoring mechanism. The pseudo posterior is formulated by selectively downweighting each likelihood contribution proportionally to its disclosure risk. On its own, the pseudo posterior mechanism produces a weaker asymptotic differential privacy (aDP) guarantee. After embedding in the censoring mechanism, the DP guarantee becomes strict such that it does not rely on asymptotics. We demonstrate that the pseudo posterior mechanism creates synthetic data with the highest utility at the price of a weaker, aDP guarantee, while embedding the pseudo posterior mechanism in the proposed censoring mechanism produces synthetic data with a stronger, non-asymptotic DP guarantee at the cost of slightly reduced utility. The perturbed histogram mechanism is included for comparison.
翻译:本文提出一种新方法,将任何用于生成合成数据的贝叶斯模型嵌入其中,并转化为一个差分隐私(DP)机制。我们建议对模型合成器进行修改,利用一种截断似然函数,该函数产生 $[\exp(-\epsilon / 2), \exp(\epsilon / 2)]$ 的上下界,其中 $\epsilon$ 表示DP保证的级别。配备 $\epsilon-$DP 保证的该截断机制,通过将联合参数后验分布展平或向弱先验信息方向偏移,从而在该分布中引入失真。为最小化似然截断引起的后验分布失真,我们在截断机制中嵌入一种向量加权伪后验机制。该伪后验通过根据每个似然贡献的披露风险按比例选择性降低其权重来构建。单独使用时,伪后验机制产生较弱的渐近差分隐私(aDP)保证。嵌入截断机制后,DP保证变得严格,不再依赖渐近性。我们证明,伪后验机制以较弱的aDP保证为代价,生成具有最高效用的合成数据;而将伪后验机制嵌入所提出的截断机制中,则以略微降低的效用为代价,生成具有更强非渐近DP保证的合成数据。同时纳入扰动直方图机制作为对比。