Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. Although extensive research has focused on developing efficient machine unlearning strategies, we argue that these methods mainly aim at removing samples rather than removing samples' influence on the model, thus overlooking the fundamental definition of machine unlearning. In this paper, we first conduct a comprehensive study to evaluate the effectiveness of existing unlearning schemes when the training dataset includes many samples similar to those targeted for unlearning. Specifically, we evaluate: Do existing unlearning methods truly adhere to the original definition of machine unlearning and effectively eliminate all influence of target samples when similar samples are present in the training dataset? Our extensive experiments, conducted on four carefully constructed datasets with thorough analysis, reveal a notable gap between the expected and actual performance of most existing unlearning methods for image and language models, even for the retraining-from-scratch baseline. Additionally, we also explore potential solutions to enhance current unlearning approaches.
翻译:机器遗忘作为一种使预训练模型能够消除特定训练样本影响的过程,近年来引起了广泛关注。尽管已有大量研究致力于开发高效的机器遗忘策略,我们认为这些方法主要着眼于移除样本本身,而非消除样本对模型的影响,从而忽视了机器遗忘的基本定义。本文首先开展了一项综合性研究,以评估当训练数据集中包含大量与待遗忘样本相似的样本时,现有遗忘方案的有效性。具体而言,我们评估:当训练数据集中存在相似样本时,现有遗忘方法是否真正遵循机器遗忘的原始定义,并有效消除了目标样本的全部影响?我们在四个精心构建的数据集上进行了大量实验并开展深入分析,结果表明,对于图像和语言模型,即使以从头重新训练作为基线,大多数现有遗忘方法的预期性能与实际性能之间存在显著差距。此外,我们也探索了增强当前遗忘方法的潜在解决方案。