Membership Inference Attacks (MIAs) are widely used to evaluate the propensity of a machine learning (ML) model to memorize an individual record and the privacy risk releasing the model poses. MIAs are commonly evaluated similarly to ML models: the MIA is performed on a test set of models trained on datasets unseen during training, which are sampled from a larger pool, $D_{eval}$. The MIA is evaluated across all datasets in this test set, and is thus evaluated across the distribution of samples from $D_{eval}$. While this was a natural extension of ML evaluation to MIAs, recent work has shown that a record's risk heavily depends on its specific dataset. For example, outliers are particularly vulnerable, yet an outlier in one dataset may not be one in another. The sources of randomness currently used to evaluate MIAs may thus lead to inaccurate individual privacy risk estimates. We propose a new, specific evaluation setup for MIAs against ML models, using weight initialization as the sole source of randomness. This allows us to accurately evaluate the risk associated with the release of a model trained on a specific dataset. Using SOTA MIAs, we empirically show that the risk estimates given by the current setup lead to many records being misclassified as low risk. We derive theoretical results which, combined with empirical evidence, suggest that the risk calculated in the current setup is an average of the risks specific to each sampled dataset, validating our use of weight initialization as the only source of randomness. Finally, we consider an MIA with a stronger adversary leveraging information about the target dataset to infer membership. Taken together, our results show that current MIA evaluation is averaging the risk across datasets leading to inaccurate risk estimates, and the risk posed by attacks leveraging information about the target dataset to be potentially underestimated.
翻译:成员推理攻击(MIAs)被广泛用于评估机器学习(ML)模型记忆个体记录的倾向性以及发布该模型所带来的隐私风险。MIAs的评估方式通常与ML模型类似:在一个测试集上执行MIA,该测试集包含在训练期间未见过的数据集上训练的模型,这些数据集是从一个更大的池$D_{eval}$中采样得到的。MIA在该测试集的所有数据集上进行评估,因此实际上是在$D_{eval}$的样本分布上进行评估。虽然这是将ML评估自然扩展到MIAs的做法,但近期研究表明,一条记录的风险在很大程度上取决于其所在的特定数据集。例如,离群点尤其脆弱,但在一个数据集中的离群点可能在另一个数据集中并非如此。当前用于评估MIAs的随机性来源因此可能导致对个体隐私风险的不准确估计。我们提出了一种新的、针对ML模型的MIAs特定评估设置,仅使用权重初始化作为随机性的唯一来源。这使得我们能够准确评估在特定数据集上训练的模型发布所带来的风险。利用最先进的MIAs,我们通过实验证明,当前设置给出的风险估计导致许多记录被错误分类为低风险。我们推导了理论结果,结合实验证据表明,当前设置计算的风险是每个采样数据集特定风险的平均值,这验证了我们使用权重初始化作为唯一随机性来源的做法。最后,我们考虑了一种更强大的攻击者,其利用关于目标数据集的信息来推断成员身份。综上所述,我们的结果表明,当前的MIA评估正在跨数据集平均风险,导致风险估计不准确,并且利用目标数据集信息的攻击所带来的风险可能被低估。