An ability to share data, even in aggregated form, is critical to advancing both conventional and data science. However, insofar as such datasets are comprised of individuals, their membership in these datasets is often viewed as sensitive, with membership inference attacks (MIAs) threatening to violate their privacy. We propose a Bayesian game model for privacy-preserving publishing of data-sharing mechanism outputs (for example, summary statistics for sharing genomic data). In this game, the defender minimizes a combination of expected utility and privacy loss, with the latter being maximized by a Bayes-rational attacker. We propose a GAN-style algorithm to approximate a Bayes-Nash equilibrium of this game, and introduce the notions of Bayes-Nash generative privacy (BNGP) and Bayes generative privacy (BGP) risk that aims to optimally balance the defender's privacy and utility in a way that is robust to the attacker's heterogeneous preferences with respect to true and false positives. We demonstrate the properties of composition and post-processing for BGP risk and establish conditions under which BNGP and pure differential privacy (PDP) are equivalent. We apply our method to sharing summary statistics, where MIAs can re-identify individuals even from aggregated data. Theoretical analysis and empirical results demonstrate that our Bayesian game-theoretic method outperforms state-of-the-art approaches for privacy-preserving sharing of summary statistics.
翻译:共享数据(即使是聚合形式)的能力对于推动传统科学和数据科学的发展至关重要。然而,由于此类数据集由个体组成,他们在这些数据集中的成员身份通常被视为敏感信息,而成员推理攻击(MIAs)则可能威胁到其隐私。我们提出了一种贝叶斯博弈模型,用于隐私保护地发布数据共享机制的输出(例如,用于共享基因组数据的汇总统计量)。在该博弈中,防御者最小化期望效用和隐私损失的组合,而后者由贝叶斯理性的攻击者最大化。我们提出了一种GAN风格的算法来近似该博弈的贝叶斯-纳什均衡,并引入了贝叶斯-纳什生成隐私(BNGP)和贝叶斯生成隐私(BGP)风险的概念,旨在以最优方式平衡防御者的隐私和效用,同时对攻击者在真阳性和假阳性方面的异质性偏好具有鲁棒性。我们证明了BGP风险具有组合性和后处理性,并建立了BNGP与纯差分隐私(PDP)等价的条件。我们将我们的方法应用于共享汇总统计量,其中MIAs甚至可以从聚合数据中重新识别个体。理论分析和实证结果表明,我们的贝叶斯博弈论方法在隐私保护共享汇总统计量方面优于现有最先进的方法。