Using real-world study data usually requires contractual agreements where research results may only be published in anonymized form. Requiring formal privacy guarantees, such as differential privacy, could be helpful for data-driven projects to comply with data protection. However, deploying differential privacy in consumer use cases raises the need to explain its underlying mechanisms and the resulting privacy guarantees. In this paper, we thoroughly review and extend an existing privacy metric. We show how to compute this risk metric efficiently for a set of basic statistical queries. Our empirical analysis based on an extensive, real-world scientific data set expands the knowledge on how to compute risks under realistic conditions, while presenting more challenges than solutions.
翻译:使用真实世界研究数据通常需要签订合同协议,其中规定研究成果仅能以匿名形式发表。要求形式化的隐私保障(如差分隐私)可能有助于数据驱动项目符合数据保护要求。然而,在消费级应用场景中部署差分隐私时,需要解释其底层机制及由此产生的隐私保障。本文深入评述并扩展了一种现有的隐私度量指标。我们展示了如何针对一组基础统计查询高效计算该风险指标。基于大规模真实世界科学数据集的实证分析,拓展了在实际条件下计算风险的知识体系,同时揭示出当前面临的挑战多于解决方案。