Data sharing between different parties has become increasingly common across industry and academia. An important class of privacy concerns that arises in data sharing scenarios regards the underlying distribution of data. For example, the total traffic volume of data from a networking company can reveal the scale of its business, which may be considered a trade secret. Unfortunately, existing privacy frameworks (e.g., differential privacy, anonymization) do not adequately address such concerns. In this paper, we propose summary statistic privacy, a framework for analyzing and protecting these summary statistic privacy concerns. We propose a class of quantization mechanisms that can be tailored to various data distributions and statistical secrets, and analyze their privacy-distortion trade-offs under our framework. We prove corresponding lower bounds on the privacy-utility tradeoff, which match the tradeoffs of the quantization mechanism under certain regimes, up to small constant factors. Finally, we demonstrate that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms on real-world datasets.
翻译:不同主体间的数据共享在工业界和学术界日益普遍。数据共享场景中产生的一类重要隐私问题涉及数据的潜在分布。例如,网络公司的总数据流量可能暴露其业务规模,而这可能被视为商业机密。遗憾的是,现有隐私框架(如差分隐私、匿名化)未能充分解决此类问题。本文提出概要统计隐私(summary statistic privacy)框架,用于分析和保护这类概要统计隐私问题。我们提出了一类可针对不同数据分布和统计秘密进行定制的量化机制,并在该框架下分析其隐私-失真权衡。我们证明了隐私-效用权衡的相应下界,该下界在特定条件下(至多相差小常数因子)与量化机制的权衡相匹配。最后,我们通过真实数据集验证了所提出的量化机制相较于替代隐私机制能实现更优的隐私-失真权衡。