ImpMIA: Leveraging Implicit Bias for Membership Inference Attack

Determining which data samples were used to train a model, known as Membership Inference Attack (MIA), is a well-studied and important problem with implications on data privacy. SotA methods (which are black-box attacks) rely on training many auxiliary reference models to imitate the behavior of the attacked model. As such, they rely on assumptions which rarely hold in real-world settings: (i) the attacker knows the training hyperparameters; (ii) all available non-training samples come from the same distribution as the training data; and (iii) the fraction of training data in the evaluation set is known. We show that removing these assumptions significantly harms the performance of black-box attacks. We introduce ImpMIA, a Membership Inference Attack that exploits the Implicit Bias of neural networks. Building on the maximum-margin implicit bias theory, ImpMIA uses the Karush-Kuhn-Tucker (KKT) optimality conditions to identify training samples -- those whose gradients most strongly reconstruct the trained model's parameters. Our approach is optimization-based, and requires NO training of reference-models, thus removing the need for any knowledge/assumptions regarding the attacked model's training procedure. While ImpMIA is a white-box attack (a setting which assumes access to model weights), this is becoming increasingly realistic given that many models are publicly available (e.g., via Hugging Face). ImpMIA achieves SotA performance compared to both black and white box attacks in settings where only the model weights are known, and a superset of the training data is available.

翻译：确定哪些数据样本被用于训练模型，即成员推理攻击（MIA），是一个经过深入研究且具有重要意义的问题，对数据隐私具有重要影响。现有最先进方法（属于黑盒攻击）依赖于训练大量辅助参考模型来模拟被攻击模型的行为。因此，这些方法依赖于在现实场景中很少成立的假设：（i）攻击者知晓训练超参数；（ii）所有可用的非训练样本与训练数据来自相同分布；（iii）评估集中训练数据的比例已知。我们证明，消除这些假设会显著损害黑盒攻击的性能。本文提出ImpMIA，一种利用神经网络隐式偏差的成员推理攻击方法。基于最大间隔隐式偏差理论，ImpMIA运用Karush-Kuhn-Tucker（KKT）最优性条件来识别训练样本——即那些梯度能最强重构已训练模型参数的样本。我们的方法基于优化，无需训练参考模型，从而消除了对被攻击模型训练过程的任何知识/假设需求。虽然ImpMIA属于白盒攻击（该设定假设可访问模型权重），但鉴于当前许多模型已公开可用（例如通过Hugging Face平台），这种设定正变得日益现实。在仅已知模型权重且可获得训练数据超集的场景下，ImpMIA相较于黑盒与白盒攻击均实现了最先进的性能。