Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA.
翻译:针对目标深度神经网络的成员推断攻击已被提出用于审计模型。给定一组受试对象,成员推断攻击可判断目标深度神经网络在训练过程中是否见过这些对象。本文聚焦于训练后成员推断攻击,重点强调高置信度成员检测——即低假阳性率下的真阳性率。当前该领域的工作,如似然比攻击和增强型成员推断攻击,仅在复杂数据集(如CIFAR-10和ImageNet)上表现良好(此时目标深度网络对训练集过拟合),但在简单数据集上效果欠佳(在Fashion-MNIST上两种攻击的真阳性率均为0%,在MNIST上LiRA和EMIA在1%假阳性率下分别达到2%和0%)。为解决这一问题,首先,我们通过提出一个分为三个阶段(准备、指示和决策)的框架来统一当前的成员推断攻击。其次,我们利用该框架提出两种新型攻击:(1)对抗性成员推断攻击,通过对抗性优化新型损失函数高效利用受试对象的成员与非成员信息,在Fashion-MNIST和MNIST数据集上均达到6%真阳性率;(2)增强型对抗性成员推断攻击,结合EMIA与AMIA,在1%假阳性率下分别于Fashion-MNIST和MNIST数据集上实现8%和4%真阳性率。第三,我们引入两种新型增强指标,正向利用受试对象高斯邻域内的损失信息,使四种攻击在1%假阳性率下于Fashion-MNIST和MNIST数据集上的平均真阳性率分别提升2.5%和0.25%。最后,我们提出简单而新颖的评估指标——给定假阳性率下的运行真阳性率平均值,该指标能更好地区分低假阳性率区域内的不同成员推断攻击。我们还证明,相较于LiRA和EMIA,AMIA和E-AMIA对未知深度神经网络(非目标模型)具有更强的迁移性,并对DP-SGD训练具有更好的鲁棒性。