Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models

Membership Inference Attacks (MIAs) are widely used to evaluate the propensity of a machine learning (ML) model to memorize an individual record and the privacy risk releasing the model poses. MIAs are commonly evaluated similarly to ML models: the MIA is performed on a test set of models trained on datasets unseen during training, which are sampled from a larger pool, $D_{eval}$. The MIA is evaluated across all datasets in this test set, and is thus evaluated across the distribution of samples from $D_{eval}$. While this was a natural extension of ML evaluation to MIAs, recent work has shown that a record's risk heavily depends on its specific dataset. For example, outliers are particularly vulnerable, yet an outlier in one dataset may not be one in another. The sources of randomness currently used to evaluate MIAs may thus lead to inaccurate individual privacy risk estimates. We propose a new, specific evaluation setup for MIAs against ML models, using weight initialization as the sole source of randomness. This allows us to accurately evaluate the risk associated with the release of a model trained on a specific dataset. Using SOTA MIAs, we empirically show that the risk estimates given by the current setup lead to many records being misclassified as low risk. We derive theoretical results which, combined with empirical evidence, suggest that the risk calculated in the current setup is an average of the risks specific to each sampled dataset, validating our use of weight initialization as the only source of randomness. Finally, we consider an MIA with a stronger adversary leveraging information about the target dataset to infer membership. Taken together, our results show that current MIA evaluation is averaging the risk across datasets leading to inaccurate risk estimates, and the risk posed by attacks leveraging information about the target dataset to be potentially underestimated.

翻译：成员推理攻击（MIAs）被广泛用于评估机器学习（ML）模型记忆个体记录的倾向性以及发布该模型所带来的隐私风险。MIAs的评估方式通常与ML模型类似：在一个测试集上执行MIA，该测试集包含在训练期间未见过的数据集上训练的模型，这些数据集是从一个更大的池$D_{eval}$中采样得到的。MIA在该测试集的所有数据集上进行评估，因此实际上是在$D_{eval}$的样本分布上进行评估。虽然这是将ML评估自然扩展到MIAs的做法，但近期研究表明，一条记录的风险在很大程度上取决于其所在的特定数据集。例如，离群点尤其脆弱，但在一个数据集中的离群点可能在另一个数据集中并非如此。当前用于评估MIAs的随机性来源因此可能导致对个体隐私风险的不准确估计。我们提出了一种新的、针对ML模型的MIAs特定评估设置，仅使用权重初始化作为随机性的唯一来源。这使得我们能够准确评估在特定数据集上训练的模型发布所带来的风险。利用最先进的MIAs，我们通过实验证明，当前设置给出的风险估计导致许多记录被错误分类为低风险。我们推导了理论结果，结合实验证据表明，当前设置计算的风险是每个采样数据集特定风险的平均值，这验证了我们使用权重初始化作为唯一随机性来源的做法。最后，我们考虑了一种更强大的攻击者，其利用关于目标数据集的信息来推断成员身份。综上所述，我们的结果表明，当前的MIA评估正在跨数据集平均风险，导致风险估计不准确，并且利用目标数据集信息的攻击所带来的风险可能被低估。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日