Membership inference attacks (MIAs) threaten the privacy of machine learning models by revealing whether a specific data point was used during training. Existing MIAs often rely on impractical assumptions such as access to public datasets, shadow models, confidence scores, or training data distribution knowledge and making them vulnerable to defenses like confidence masking and adversarial regularization. Label-only MIAs, even under strict constraints suffer from high query requirements per sample. We propose a cost-effective label-only MIA framework based on transferability and model extraction. By querying the target model M using active sampling, perturbation-based selection, and synthetic data, we extract a functionally similar surrogate S on which membership inference is performed. This shifts query overhead to a one-time extraction phase, eliminating repeated queries to M . Operating under strict black-box constraints, our method matches the performance of state-of-the-art label-only MIAs while significantly reducing query costs. On benchmarks including Purchase, Location, and Texas Hospital, we show that a query budget equivalent to testing $\approx1\%$ of training samples suffices to extract S and achieve membership inference accuracy within $\pm1\%$ of M . We also evaluate the effectiveness of standard defenses proposed for label-only MIAs against our attack.
翻译:成员推断攻击通过揭示特定数据点是否在训练过程中被使用,对机器学习模型的隐私构成威胁。现有成员推断攻击通常依赖不切实际的假设,例如访问公共数据集、影子模型、置信度分数或训练数据分布知识,使其容易受到置信度掩码和对抗正则化等防御措施的影响。标签仅成员推断攻击即使在严格约束下,也存在每个样本查询需求高的问题。我们提出了一种基于可迁移性和模型提取的经济高效的标签仅成员推断框架。通过使用主动采样、基于扰动的选择和合成数据查询目标模型M,我们提取出功能相似的代理模型S,并在其上执行成员推断。这将查询开销转移到一次性提取阶段,消除了对M的重复查询。在严格的黑盒约束下运行,我们的方法在显著降低查询成本的同时,达到了最先进的标签仅成员推断攻击的性能。在包括Purchase、Location和Texas Hospital在内的基准测试中,我们表明,相当于测试约1%训练样本的查询预算足以提取S,并实现与M的成员推断准确率相差±1%以内的结果。我们还评估了针对标签仅成员推断攻击提出的标准防御措施对我们攻击的有效性。