We consider the problem of ensuring confidentiality of dataset properties aggregated over many records of a dataset. Such properties can encode sensitive information, such as trade secrets or demographic data, while involving a notion of data protection different to the privacy of individual records typically discussed in the literature. In this work, we demonstrate how a distribution privacy framework can be applied to formalize such data confidentiality. We extend the Wasserstein Mechanism from Pufferfish privacy and the Gaussian Mechanism from attribute privacy to this framework, then analyze their underlying data assumptions and how they can be relaxed. We then empirically evaluate the privacy-utility tradeoffs of these mechanisms and apply them against a practical property inference attack which targets global properties of datasets. The results show that our mechanisms can indeed reduce the effectiveness of the attack while providing utility substantially greater than a crude group differential privacy baseline. Our work thus provides groundwork for theoretical mechanisms for protecting global properties of datasets along with their evaluation in practice.
翻译:本文研究如何确保数据集中聚合多个记录所得属性的机密性。此类属性可能编码敏感信息(如商业秘密或人口统计数据),涉及的数据保护概念与文献中通常讨论的个体记录隐私存在本质区别。本研究展示了如何将分布隐私框架形式化地应用于此类数据机密性保护。我们将来自河豚隐私机制的Wasserstein机制和来自属性隐私的高斯机制扩展至该框架,随后分析其底层数据假设及放松约束的可能性。进一步通过实验评估这些机制的隐私-效用权衡,并将其应用于针对数据集全局属性的实用属性推理攻击。结果表明,在提供显著优于粗粒度群组差分隐私基线的效用的同时,我们的机制确实能降低攻击有效性。本研究为保护数据集全局属性的理论机制及其实践评估奠定了基础。