Data analysis has high value both for commercial and research purposes. However, disclosing analysis results may pose severe privacy risk to individuals. Privug is a method to quantify privacy risks of data analytics programs by analyzing their source code. The method uses probability distributions to model attacker knowledge and Bayesian inference to update said knowledge based on observable outputs. Currently, Privug uses Markov Chain Monte Carlo (MCMC) to perform inference, which is a flexible but approximate solution. This paper presents an exact Bayesian inference engine based on multivariate Gaussian distributions to accurately and efficiently quantify privacy risks. The inference engine is implemented for a subset of Python programs that can be modeled as multivariate Gaussian models. We evaluate the method by analyzing privacy risks in programs to release public statistics. The evaluation shows that our method accurately and efficiently analyzes privacy risks, and outperforms existing methods. Furthermore, we demonstrate the use of our engine to analyze the effect of differential privacy in public statistics.
翻译:数据分析在商业和研究领域均具有重要价值。然而,公开分析结果可能给个人带来严重的隐私风险。Privug是一种通过分析数据分析程序源代码来量化其隐私风险的方法。该方法使用概率分布建模攻击者知识,并基于可观测输出通过贝叶斯推理更新该知识。当前,Privug采用马尔可夫链蒙特卡洛(MCMC)方法进行推理,这是一种灵活但近似的解决方案。本文提出一种基于多元高斯分布的精确贝叶斯推理引擎,能够准确高效地量化隐私风险。该推理引擎面向可建模为多元高斯模型的Python程序子集实现。我们通过分析发布公共统计数据程序中的隐私风险来评估该方法。评估结果表明,我们的方法能够准确且高效地分析隐私风险,并优于现有方法。此外,我们还展示了使用该引擎分析差分隐私在公共统计数据中影响的实际应用效果。