We study a setting where a data holder wishes to share data with a receiver, without revealing certain summary statistics of the data distribution (e.g., mean, standard deviation). It achieves this by passing the data through a randomization mechanism. We propose summary statistic privacy, a metric for quantifying the privacy risk of such a mechanism based on the worst-case probability of an adversary guessing the distributional secret within some threshold. Defining distortion as a worst-case Wasserstein-1 distance between the real and released data, we prove lower bounds on the tradeoff between privacy and distortion. We then propose a class of quantization mechanisms that can be adapted to different data distributions. We show that the quantization mechanism's privacy-distortion tradeoff matches our lower bounds under certain regimes, up to small constant factors. Finally, we demonstrate on real-world datasets that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms.
翻译:我们研究数据持有者希望与接收者共享数据,同时不泄露数据分布的某些摘要统计信息(例如均值、标准差)的场景。通过将数据通过随机化机制实现这一目标。我们提出摘要统计隐私,这是一种量化此类机制隐私风险的度量标准,基于攻击者在某个阈值内猜测分布性秘密的最坏情况概率。将失真定义为真实数据与发布数据之间的最坏情况Wasserstein-1距离,我们证明了隐私与失真之间权衡的下界。随后,我们提出了一类可适应不同数据分布的量化机制。我们证明,在某些情况下,量化机制的隐私-失真权衡与我们的下界相匹配,仅相差小的常数因子。最后,我们在真实数据集上验证,所提出的量化机制相比其他隐私机制实现了更优的隐私-失真权衡。