Membership Inference Attacks (MIAs) pose a significant privacy risk by enabling adversaries to determine if a specific data point was part of a model's training set. This work empirically investigates whether MU algorithms can function as a targeted, active defense mechanism, in scenarios where a privacy audit identifies specific classes or individuals as highly susceptible to MIAs post-training. By 'dulling' the model's categorical memory of these samples, the process effectively mitigates the membership signal and reduces the MIA success rate for the most vulnerable users. We evaluate the defense potential of three MU algorithms, Negative Gradient (neg grad), SCalable Remembering and Unlearning unBound (SCRUB), and Selective Fine-tuning and Targeted Confusion (SFTC), across four diverse datasets and three complexity-based model groups. Our findings reveal that MU can function as a countermeasure against MIAs, though its success is critically contingent on algorithm choice, model capacity, and a profound sensitivity to learning rates. While Negative Gradient often induces a generalized degradation of membership signals across both forget and retain set, SFTC identifies a critical ``divergence effect'' where targeted forgetting reinforces the membership signal of retained data. Conversely, SCRUB provides a more balanced defense with minimal collateral impact on MIA perspective.
翻译:成员推理攻击(MIAs)通过使攻击者能够判断特定数据点是否属于模型训练集,构成了重大的隐私风险。本研究通过实证方法探究,在隐私审计识别出特定类别或个体在训练后极易受到MIAs攻击的场景中,机器遗忘算法能否作为一种有针对性的主动防御机制。通过"钝化"模型对这些样本的类别记忆,该过程能有效削弱成员信号,并降低最易受攻击用户的MIA成功率。我们在四个不同数据集和三个基于复杂度的模型组上,评估了三种机器遗忘算法——负梯度、可扩展记忆与无界遗忘以及选择性微调与定向混淆——的防御潜力。研究结果表明,机器遗忘可作为对抗MIAs的防御手段,但其成功与否关键取决于算法选择、模型容量以及对学习率的深度敏感性。尽管负梯度通常会导致遗忘集与保留集的成员信号普遍退化,但选择性微调与定向混淆发现了一种关键的"发散效应",即定向遗忘反而强化了保留数据的成员信号。相比之下,可扩展记忆与无界遗忘提供了更为均衡的防御效果,对MIA视角的附带影响最小。