This paper introduces a new method that embeds any Bayesian model used to generate synthetic data and converts it into a differentially private (DP) mechanism. We propose an alteration of the model synthesizer to utilize a censored likelihood that induces upper and lower bounds of [$\exp(-\epsilon / 2), \exp(\epsilon / 2)$], where $\epsilon$ denotes the level of the DP guarantee. This censoring mechanism equipped with an $\epsilon-$DP guarantee will induce distortion into the joint parameter posterior distribution by flattening or shifting the distribution towards a weakly informative prior. To minimize the distortion in the posterior distribution induced by likelihood censoring, we embed a vector-weighted pseudo posterior mechanism within the censoring mechanism. The pseudo posterior is formulated by selectively downweighting each likelihood contribution proportionally to its disclosure risk. On its own, the pseudo posterior mechanism produces a weaker asymptotic differential privacy (aDP) guarantee. After embedding in the censoring mechanism, the DP guarantee becomes strict such that it does not rely on asymptotics. We demonstrate that the pseudo posterior mechanism creates synthetic data with the highest utility at the price of a weaker, aDP guarantee, while embedding the pseudo posterior mechanism in the proposed censoring mechanism produces synthetic data with a stronger, non-asymptotic DP guarantee at the cost of slightly reduced utility. The perturbed histogram mechanism is included for comparison.
翻译:本文提出了一种新方法,可将任何用于生成合成数据的贝叶斯模型嵌入并转化为差分隐私机制。我们建议对模型合成器进行改进,采用一种截断似然函数,该函数能诱导出上下界为[$\exp(-\epsilon / 2), \exp(\epsilon / 2)$]的约束,其中$\epsilon$表示差分隐私保证的强度。这种配备$\epsilon$-差分隐私保证的截断机制会通过将联合参数后验分布拉平或向弱信息先验分布偏移,从而引入分布畸变。为最小化由似然截断导致的后验分布畸变,我们在截断机制中嵌入了一种向量加权伪后验机制。该伪后验通过按比例降低每个似然贡献的权重(与其披露风险成反比)来构建。单独使用时,该伪后验机制仅能提供较弱的渐近差分隐私保证。而将其嵌入截断机制后,差分隐私保证变为严格形式,不再依赖于渐近性。我们证明:伪后验机制能以较弱的渐近差分隐私保证为代价生成效用最高的合成数据;而将伪后验机制嵌入所提出的截断机制后,能以轻微降低的效用为代价生成具有更强非渐近差分隐私保证的合成数据。为便于比较,本文还纳入了扰动直方图机制。