Machine unlearning (MU) aims to expunge a designated forget set from a trained model without costly retraining, yet the existing techniques overlook two critical blind spots: "over-unlearning" that deteriorates retained data near the forget set, and post-hoc "relearning" attacks that aim to resurrect the forgotten knowledge. Focusing on class-level unlearning, we first derive an over-unlearning metric, OU@epsilon, which quantifies collateral damage in regions proximal to the forget set, where over-unlearning mainly appears. Next, we expose an unforeseen relearning threat on MU, i.e., the Prototypical Relearning Attack, which exploits the per-class prototype of the forget class with just a few samples, and easily restores the pre-unlearning performance. To counter both blind spots in class-level unlearning, we introduce Spotter, a plug-and-play objective that combines (i) a masked knowledge-distillation penalty on the nearby region of forget classes to suppress OU@epsilon, and (ii) an intra-class dispersion loss that scatters forget-class embeddings, neutralizing Prototypical Relearning Attacks. Spotter achieves state-of-the-art results across CIFAR, TinyImageNet, and CASIA-WebFace datasets, offering a practical remedy to unlearning's blind spots.
翻译:机器遗忘学习旨在从训练好的模型中抹除指定的遗忘集,而无需昂贵的重新训练,然而现有技术忽视了两个关键盲区:一是"过度遗忘"会损害遗忘集附近的保留数据;二是事后"再学习"攻击试图恢复已遗忘的知识。聚焦于类别级遗忘,我们首先推导出过度遗忘度量OU@epsilon,该指标量化了遗忘集邻近区域的附带损害,过度遗忘主要在此区域显现。接着,我们揭示了机器遗忘学习中一种未被预见的再学习威胁,即原型再学习攻击——该攻击仅需少量样本即可利用遗忘类别的每类原型,轻松恢复遗忘前的模型性能。为应对类别级遗忘中的这两个盲区,我们提出了Spotter,一种即插即用的优化目标,其结合了:(i) 在遗忘类别邻近区域施加掩码知识蒸馏惩罚以抑制OU@epsilon;(ii) 通过类内离散化损失分散遗忘类别的嵌入表示,从而中和原型再学习攻击。Spotter在CIFAR、TinyImageNet和CASIA-WebFace数据集上均取得了最先进的结果,为遗忘学习的盲区提供了实用解决方案。