Formal disclosure avoidance techniques are necessary to ensure that published data can not be used to identify information about individuals. The addition of statistical noise to unpublished data can be implemented to achieve differential privacy, which provides a formal mathematical privacy guarantee. However, the infusion of noise results in data releases which are less precise than if no noise had been added, and can lead to some of the individual data points being nonsensical. Examples of this are estimates of population counts which are negative, or estimates of the ratio of counts which violate known constraints. A straightforward way to guarantee that published estimates satisfy these known constraints is to specify a statistical model and incorporate a prior on census counts and ratios which properly constrains the parameter space. We utilize rejection sampling methods for drawing samples from the posterior distribution and we show that this implementation produces estimates of population counts and ratios which maintain formal privacy, are more precise than the original unconstrained noisy measurements, and are guaranteed to satisfy prior constraints.
翻译:正式的披露规避技术对于确保发布的数据不能被用于识别个人身份信息至关重要。通过在未发布的数据中添加统计噪声,可以实现差分隐私,从而提供正式的数学隐私保障。然而,噪声的注入会导致数据发布的精度低于未添加噪声的情况,并可能使某些数据点失去实际意义。例如,人口估计值可能出现负数,或比率估计值违反已知的约束条件。确保发布估计值满足这些已知约束的一个直接方法是指定一个统计模型,并对人口普查计数和比率施加一个能合理约束参数空间的先验分布。我们利用拒绝采样方法从后验分布中抽取样本,并证明该实现方法产生的人口计数和比率估计值既能维持形式隐私,又比原始无约束的含噪测量值更精确,且能保证满足先验约束。