We introduce One-Shot Label-Only (OSLO) membership inference attacks (MIAs), which accurately infer a given sample's membership in a target model's training set with high precision using just \emph{a single query}, where the target model only returns the predicted hard label. This is in contrast to state-of-the-art label-only attacks which require $\sim6000$ queries, yet get attack precisions lower than OSLO's. OSLO leverages transfer-based black-box adversarial attacks. The core idea is that a member sample exhibits more resistance to adversarial perturbations than a non-member. We compare OSLO against state-of-the-art label-only attacks and demonstrate that, despite requiring only one query, our method significantly outperforms previous attacks in terms of precision and true positive rate (TPR) under the same false positive rates (FPR). For example, compared to previous label-only MIAs, OSLO achieves a TPR that is at least 7$\times$ higher under a 1\% FPR and at least 22$\times$ higher under a 0.1\% FPR on CIFAR100 for a ResNet18 model. We evaluated multiple defense mechanisms against OSLO.
翻译:本文提出单次查询仅标签(OSLO)成员推断攻击(MIA),该方法仅需通过\emph{单次查询}即可高精度推断给定样本是否属于目标模型的训练集,且目标模型仅返回预测的硬标签。相比之下,现有最先进的仅标签攻击需要约6000次查询,但其攻击精度仍低于OSLO。OSLO基于迁移黑盒对抗攻击实现,其核心思想是:相较于非成员样本,成员样本对对抗扰动表现出更强的抵抗性。我们将OSLO与最先进的仅标签攻击进行对比,结果表明:尽管仅需单次查询,在相同误报率(FPR)条件下,本方法在精度和真阳性率(TPR)方面均显著优于现有攻击。例如,在CIFAR100数据集上针对ResNet18模型,相较于现有仅标签MIA,OSLO在1% FPR下的TPR至少提升7倍,在0.1% FPR下至少提升22倍。我们还评估了多种针对OSLO的防御机制。