A Bayesian Approach to Membership Inference for Statistical Release

The membership inference problem for publicly released statistics from a private dataset is well-studied. When developing and formally analyzing attack strategies, however, the focus has been on attacks that model the population using only its marginals. In practice, these attacks can perform well on various populations, however most formal analysis is for populations that follow a product distribution. These strategies may fail to leverage useful information about the population that is important for understanding a realistic privacy threat. In this work, we explore the impact of providing an attacker with additional information about the attribute dependency structure of the population, motivated by examples where multiple parties may have access to similarly structured data, for example the US Census and the IRS. To model this scenario, we re-frame the membership inference problem with respect to a population represented as a Bayesian network (BN). We develop a framework based on Bayesian decision-making which can incorporate prior information about the population to launch more effective, specialized attacks. To evaluate our framework, we introduce a specific attack instantiation which computes the Bayesian posterior using a probabilistic program, and prove its equivalence to an optimal variant of the likelihood ratio test attack for two populations with strong attribute dependency. We implement our program in the Roulette probabilistic programming language and show experimentally that it outperforms the likelihood ratio test and inner product attacks on five commonly used BNs, where the population dependency structure is too complex for the existing attacks to be manually adapted.

翻译：公开统计数据中的成员推理问题已得到充分研究。然而，在攻击策略的制定和形式化分析中，现有研究主要关注仅利用总体边缘分布建模的攻击。实践中这些攻击可在多种总体中表现良好，但大部分形式化分析针对的是服从乘积分布的总体。此类策略可能无法充分利用对理解现实隐私威胁至关重要的总体信息。本文受多方可能访问相似结构数据（如美国人口普查局和国税局的数据）的实例启发，探讨为攻击者提供总体属性依赖结构额外信息的影响。为建模此场景，我们将成员推理问题重新框架化为以贝叶斯网络表示的总体。我们基于贝叶斯决策理论构建了一个框架，该框架可整合关于总体的先验信息以发动更高效、更具针对性的攻击。为评估该框架，我们提出了一种具体的攻击实例，通过概率程序计算贝叶斯后验概率，并证明该攻击在强属性依赖的两总体中与似然比检验攻击的最优变体等价。我们使用Roulette概率编程语言实现该程序，实验结果表明，在五种常用贝叶斯网络上，当总体依赖结构过于复杂而无法手动调整现有攻击时，本方法优于似然比检验和内积攻击。